Configuration Parameters

By default, the Mage table name is used to set the Elasticsearch index name.

KeyDescriptionDefaultRequired
schemeHTTP scheme used to connect to Elasticsearch (http or https).http
hostElasticsearch host. Can be a domain or IP address.localhost
portElasticsearch port.9200
usernameUsername for basic authentication.None
passwordPassword for basic authentication.None
bearer_tokenBearer token for token-based authentication.None
api_key_idAPI key ID for key-based authentication.None
api_keyAPI key for key-based authentication.None
ssl_ca_filePath to SSL certificate file used for TLS verification.None
verify_certsWhether to verify SSL certificates when using HTTPS.True
index_schema_fieldsJSONPath mapping for selecting values from the record to generate index names dynamically.None
_op_typeOperation type for the Elasticsearch request. Typically index, but can be create, update, or delete.index
metadata_fieldsDictionary used to map fields in the input record to metadata (e.g. _id) in the Elasticsearch index request.None
bulk_kwargsSettings that control Elasticsearch bulk behavior. Useful for tuning performance and retries. See table below.None

Metadata Field Mapping

Use metadata_fields to extract values from incoming records and apply them to Elasticsearch document metadata (e.g., _id):

metadata_fields:
  <stream_name>:
    <field>: <jsonpath>

Example:

metadata_fields:
  stream_name:
    _id: my_id

This will set the _id of each indexed document to the value of the my_id field in the input record.


Bulk Operation Settings (bulk_kwargs)

All options below are optional, but useful for controlling indexing performance and fault tolerance.

KeyDescriptionDefault
use_parallelIf true, enables parallel bulk indexing using multiple threads. Can significantly improve throughput.false
chunk_sizeMax number of records per bulk request.500
max_chunk_bytesMax size in bytes of each bulk request payload.104857600
max_retries(Only with use_parallel: false) Number of retry attempts on HTTP 429 (too many requests) errors.0
initial_backoff(Only with use_parallel: false) Initial wait time (in seconds) before retry. Each subsequent retry backs off exponentially.2
max_backoff(Only with use_parallel: false) Maximum time to wait before retrying.600
thread_count(Only with use_parallel: true) Number of threads to use in the bulk operation thread pool.4
queue_size(Only with use_parallel: true) Size of the task queue buffering chunks between the main thread and processing threads.4

Example Configuration

scheme: http
host: localhost
port: 9200
metadata_fields:
  stream_name:
    _id: my_id
bulk_kwargs:
  use_parallel: true
  chunk_size: 500

Notes

  • You can authenticate with basic auth, bearer tokens, or API keys.
  • Set _op_type: create if you want to avoid overwriting documents with the same _id.
  • For production environments, it’s recommended to use https, enable verify_certs, and configure ssl_ca_file.
  • Performance tuning with bulk_kwargs is especially helpful for large datasets or real-time indexing.