Storage
ClickHouse
Add credentials
- Create a new pipeline or open an existing pipeline.
- Expand the left side of your screen to view the file browser.
- Scroll down and click on a file named
io_config.yaml
. - Enter the following keys and values under the key named
default
(you can have multiple profiles, add it under whichever is relevant to you)
version: 0.1.1
default:
# ClickHouse
CLICKHOUSE_DATABASE: ...
CLICKHOUSE_HOST: ...
CLICKHOUSE_INTERFACE: http
CLICKHOUSE_PASSWORD: ...
CLICKHOUSE_PORT: 8123
CLICKHOUSE_USERNAME: ...
Using SQL block
- Create a new pipeline or open an existing pipeline.
- Add a data loader, transformer, or data exporter block.
- Select
SQL
. - Under the
Data provider
dropdown, selectClickHouse
. - Under the
Profile
dropdown, selectdefault
(or the profile you added credentials underneath). - Enter the optional table name of the table to write to.
- Under the
Write policy
dropdown, selectReplace
orAppend
(please see SQL blocks guide for more information on write policies). - Enter in this test query:
SELECT 1
. - Run the block.
Using Python block
- Create a new pipeline or open an existing pipeline.
- Add a data loader, transformer, or data exporter block (the code snippet below is for a data loader).
- Select
Generic (no template)
. - Enter this code snippet (note: change the
config_profile
fromdefault
if you have a different profile):
from mage_ai.settings.repo import get_repo_path
from mage_ai.io.clickhouse import ClickHouse
from mage_ai.io.config import ConfigFileLoader
from os import path
from pandas import DataFrame
if 'data_loader' not in globals():
from mage_ai.data_preparation.decorators import data_loader
@data_loader
def load_data_from_clickhouse(**kwargs) -> DataFrame:
query = 'SELECT 1'
config_path = path.join(get_repo_path(), 'io_config.yaml')
config_profile = 'default'
return ClickHouse.with_config(ConfigFileLoader(config_path, config_profile)).load(query)
- Run the block.
Destination table in Data Exporter
If the destination table does not exist and the Write policy
is set
to Replace
, data exporter
will automatically create a table in ClickHouse
with Engine = Memory
and a default schema inferred from the data.
However, this may not be optimal for various use cases. Since table creation
in ClickHouse
can involve numerous details, it is strongly advised to
create the destination table manually before loading data to ensure it
meets specific requirements.
If generating the destination table via Mage export method, one can ovewrite specific column types
by using the overwrite_types
dict setting.
Example:
overwrite_types = {'column_name': 'VARCHAR(255)'}
Here is the ClickHouse data exporter code snippet that you can use:
from mage_ai.settings.repo import get_repo_path
from mage_ai.io.config import ConfigFileLoader
from mage_ai.io.clickhouse import ClickHouse
from pandas import DataFrame
from os import path
if 'data_exporter' not in globals():
from mage_ai.data_preparation.decorators import data_exporter
@data_exporter
def export_data_to_clickhouse(df: DataFrame, **kwargs) -> None:
"""
Template for exporting data to a Clickhouse database.
Specify your configuration settings in 'io_config.yaml'.
"""
database = 'database_name' # Specify the name of the schema to export data to
table_name = 'table_name' # Specify the name of the table to export data to
config_path = path.join(get_repo_path(), 'io_config.yaml')
config_profile = 'default'
ClickHouse.with_config(ConfigFileLoader(config_path, config_profile)).export(
df,
table_name=table_name,
database=database,
index=False, # Specifies whether to include index in exported table
if_exists='replace' # Specify resolution policy if table name already exists
)