Elasticsearch

Configuration Parameters

By default, the Mage table name is used to set the Elasticsearch index name.

Key	Description	Default	Required
`scheme`	HTTP scheme used to connect to Elasticsearch (`http` or `https`).	`http`	✅
`host`	Elasticsearch host. Can be a domain or IP address.	`localhost`	✅
`port`	Elasticsearch port.	`9200`	✅
`username`	Username for basic authentication.	`None`	❌
`password`	Password for basic authentication.	`None`	❌
`bearer_token`	Bearer token for token-based authentication.	`None`	❌
`api_key_id`	API key ID for key-based authentication.	`None`	❌
`api_key`	API key for key-based authentication.	`None`	❌
`ssl_ca_file`	Path to SSL certificate file used for TLS verification.	`None`	❌
`verify_certs`	Whether to verify SSL certificates when using HTTPS.	`True`	❌
`index_schema_fields`	JSONPath mapping for selecting values from the record to generate index names dynamically.	`None`	❌
`_op_type`	Operation type for the Elasticsearch request. Typically `index`, but can be `create`, `update`, or `delete`.	`index`	❌
`metadata_fields`	Dictionary used to map fields in the input record to metadata (e.g. `_id`) in the Elasticsearch index request.	`None`	❌
`bulk_kwargs`	Settings that control Elasticsearch `bulk` behavior. Useful for tuning performance and retries. See table below.	`None`	❌

Use metadata_fields to extract values from incoming records and apply them to Elasticsearch document metadata (e.g., _id):

metadata_fields:
  <stream_name>:
    <field>: <jsonpath>

Example:

metadata_fields:
  stream_name:
    _id: my_id

This will set the _id of each indexed document to the value of the my_id field in the input record.

All options below are optional, but useful for controlling indexing performance and fault tolerance.

Key	Description	Default
`use_parallel`	If `true`, enables parallel bulk indexing using multiple threads. Can significantly improve throughput.	`false`
`chunk_size`	Max number of records per bulk request.	`500`
`max_chunk_bytes`	Max size in bytes of each bulk request payload.	`104857600`
`max_retries`	(Only with `use_parallel: false`) Number of retry attempts on HTTP 429 (too many requests) errors.	`0`
`initial_backoff`	(Only with `use_parallel: false`) Initial wait time (in seconds) before retry. Each subsequent retry backs off exponentially.	`2`
`max_backoff`	(Only with `use_parallel: false`) Maximum time to wait before retrying.	`600`
`thread_count`	(Only with `use_parallel: true`) Number of threads to use in the bulk operation thread pool.	`4`
`queue_size`	(Only with `use_parallel: true`) Size of the task queue buffering chunks between the main thread and processing threads.	`4`

scheme: http
host: localhost
port: 9200
metadata_fields:
  stream_name:
    _id: my_id
bulk_kwargs:
  use_parallel: true
  chunk_size: 500

You can authenticate with basic auth, bearer tokens, or API keys.
Set _op_type: create if you want to avoid overwriting documents with the same _id.
For production environments, it’s recommended to use https, enable verify_certs, and configure ssl_ca_file.
Performance tuning with bulk_kwargs is especially helpful for large datasets or real-time indexing.