ClickHouse — open source distributed column-oriented DBMS

Add credentials

  1. Create a new pipeline or open an existing pipeline.
  2. Expand the left side of your screen to view the file browser.
  3. Scroll down and click on a file named io_config.yaml.
  4. Enter the following keys and values under the key named default (you can have multiple profiles, add it under whichever is relevant to you)
version: 0.1.1
default:
  # ClickHouse
  CLICKHOUSE_DATABASE: ...
  CLICKHOUSE_HOST: ...
  CLICKHOUSE_INTERFACE: http
  CLICKHOUSE_PASSWORD: ...
  CLICKHOUSE_PORT: 8123
  CLICKHOUSE_USERNAME: ...

Using SQL block

  1. Create a new pipeline or open an existing pipeline.
  2. Add a data loader, transformer, or data exporter block.
  3. Select SQL.
  4. Under the Data provider dropdown, select ClickHouse.
  5. Under the Profile dropdown, select default (or the profile you added credentials underneath).
  6. Enter the optional table name of the table to write to.
  7. Under the Write policy dropdown, select Replace or Append (please see SQL blocks guide for more information on write policies).
  8. Enter in this test query: SELECT 1.
  9. Run the block.

Using Python block

  1. Create a new pipeline or open an existing pipeline.
  2. Add a data loader, transformer, or data exporter block (the code snippet below is for a data loader).
  3. Select Generic (no template).
  4. Enter this code snippet (note: change the config_profile from default if you have a different profile):
from mage_ai.settings.repo import get_repo_path
from mage_ai.io.clickhouse import ClickHouse
from mage_ai.io.config import ConfigFileLoader
from os import path
from pandas import DataFrame

if 'data_loader' not in globals():
    from mage_ai.data_preparation.decorators import data_loader


@data_loader
def load_data_from_clickhouse(**kwargs) -> DataFrame:
    query = 'SELECT 1'
    config_path = path.join(get_repo_path(), 'io_config.yaml')
    config_profile = 'default'

    return ClickHouse.with_config(ConfigFileLoader(config_path, config_profile)).load(query)
  1. Run the block.

Destination table in Data Exporter

If the destination table does not exist and the Write policy is set to Replace, data exporter will automatically create a table in ClickHouse with Engine = Memory and a default schema inferred from the data. However, this may not be optimal for various use cases. Since table creation in ClickHouse can involve numerous details, it is strongly advised to create the destination table manually before loading data to ensure it meets specific requirements.

If generating the destination table via Mage export method, one can ovewrite specific column types by using the overwrite_types dict setting.

Example:

overwrite_types = {'column_name': 'VARCHAR(255)'}

Here is the ClickHouse data exporter code snippet that you can use:

from mage_ai.settings.repo import get_repo_path
from mage_ai.io.config import ConfigFileLoader
from mage_ai.io.clickhouse import ClickHouse
from pandas import DataFrame
from os import path

if 'data_exporter' not in globals():
    from mage_ai.data_preparation.decorators import data_exporter


@data_exporter
def export_data_to_clickhouse(df: DataFrame, **kwargs) -> None:
    """
    Template for exporting data to a Clickhouse database.
    Specify your configuration settings in 'io_config.yaml'.

    """
    database = 'database_name'  # Specify the name of the schema to export data to
    table_name = 'table_name'  # Specify the name of the table to export data to
    config_path = path.join(get_repo_path(), 'io_config.yaml')
    config_profile = 'default'

    ClickHouse.with_config(ConfigFileLoader(config_path, config_profile)).export(
        df,
        table_name=table_name,
        database=database,
        index=False,  # Specifies whether to include index in exported table
        if_exists='replace'  # Specify resolution policy if table name already exists
    )