Only in Mage Pro.Try our fully managed solution to access this advanced feature.
Overview
The MySQL CDC streaming source captures real-time changes from MySQL binary logs and processes them into structured records. It provides comprehensive CDC capabilities with automatic schema evolution tracking and primary key detection.Real-time CDC
Captures changes as they happen in MySQL
Schema Evolution
Tracks table structure changes automatically
Primary Key Detection
Automatically identifies and extracts primary key columns
Error Recovery
Handles server ID conflicts and binlog file issues
Configuration
Required Parameters
| Parameter | Type | Description |
|---|---|---|
host | string | MySQL server hostname or IP address |
port | number | MySQL server port (default: 3306) |
user | string | MySQL username with replication privileges |
password | string | MySQL password |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
connect_timeout | number | 10 | Connection timeout in seconds |
charset | string | utf8mb4 | Character set for connection |
databases | array | - | Include only these databases |
tables | array | - | Include only these tables |
ignore_databases | array | - | Exclude these databases |
ignore_tables | array | - | Exclude these tables |
server_id | number | 54001 | Unique server ID for replication |
auto_generate_server_id | boolean | true | Auto-generate unique server_id if conflict |
resume_from_gtids | string | - | Resume from specific GTID set |
start_log_file | string | - | Resume from specific binlog file |
start_log_pos | number | - | Resume from specific position in binlog file |
start_timestamp | number | - | Resume from specific timestamp (Unix timestamp) |
include_ddl | boolean | false | Include DDL events (CREATE, ALTER, DROP, etc.) |
include_heartbeat | boolean | true | Include heartbeat events |
include_transaction_events | boolean | true | Include transaction commit events |
include_gtid_events | boolean | true | Include GTID events |
include_rotate_events | boolean | false | Include binlog rotation events |
include_intvar_events | boolean | true | Include integer variable events |
include_load_query_events | boolean | true | Include LOAD DATA INFILE events |
include_table_map_events | boolean | true | Include table map events (required for column mapping) |
heartbeat_seconds | number | 1.0 | Heartbeat interval in seconds |
blocking | boolean | true | Use blocking mode for event reading |
max_batch_size | number | 100 | Maximum events per batch |
flush_interval_seconds | number | 1.0 | Maximum time between batch flushes |
return_db_records_only | boolean | true | Return only DB records with Mage timestamp columns |
SSL Configuration
| Parameter | Type | Description |
|---|---|---|
ssl.ca | string | Certificate Authority file path |
ssl.cert | string | Client certificate file path |
ssl.key | string | Client private key file path |
Record Formats
Standard Event Format
Whenreturn_db_records_only: false:
DB Records Only Format
Whenreturn_db_records_only: true:
_mage_* timestamp columns are returned as datetime objects (ISO 8601 format strings when serialized) in UTC timezone, not Unix timestamps. This ensures proper type handling when writing to databases.
Event Types
Row Events (Core CDC)
Row Events (Core CDC)
- WriteRowsEvent (
op: "c"): INSERT operations - UpdateRowsEvent (
op: "u"): UPDATE operations with before/after data - DeleteRowsEvent (
op: "d"): DELETE operations
DDL Events
DDL Events
- QueryEvent (
op: "ddl"): DDL statements (CREATE, ALTER, DROP) - Automatically clears table schema cache when DDL occurs
- Extracts table name from SQL for targeted cache clearing
Transaction Events
Transaction Events
- XidEvent (
op: "transaction"): Transaction commit events - Useful for maintaining transaction boundaries
Replication Events
Replication Events
- GtidEvent (
op: "gtid"): Global Transaction Identifier events - RotateEvent (
op: "rotate"): Binlog file rotation events - HeartbeatLogEvent (
op: "heartbeat"): Replication heartbeat events
Metadata Events
Metadata Events
- TableMapEvent (
op: "table_map"): Table structure metadata - IntvarEvent (
op: "intvar"): Integer variable changes - LoadQueryEvent (
op: "begin_load_query"/op: "execute_load_query"): LOAD DATA INFILE events
Primary Key Detection
The source automatically detects and extracts primary key columns:- Schema Discovery: Queries
INFORMATION_SCHEMAfor table structure - Primary Key Detection: Identifies actual primary key columns from database constraints
- Caching: Stores schema and primary key information for performance
- Key Extraction: Extracts primary key column names from row events
key_columns is a list of column names (not values) that can be used for deduplication or upsert operations in downstream sinks.
Examples
Prerequisites
MySQL Server Configuration
User Permissions
Troubleshooting
Common Issues
Common Issues
- Server ID Conflict: Enable
auto_generate_server_id: true - Binlog File Not Found: Check if binlog file exists and is accessible
- Permission Denied: Ensure user has replication privileges
- Connection Timeout: Increase
connect_timeoutvalue - Schema Not Found: Ensure
include_table_map_events: true
Debugging
Debugging
Enable debug logging to see detailed event processing:
Monitoring
Monitoring
Monitor key metrics:
- Batch Size: Average events per batch
- Flush Rate: How often batches are flushed
- Error Rate: Failed events or connections
- Lag: Time between event occurrence and processing
Integration with Generic IO Sink
The MySQL CDC source works seamlessly with the Generic IO sink, which provides:- Automatic Column Type Mapping: Automatically maps
_mage_*timestamp columns to appropriate database types (TIMESTAMP, DATETIME2, DateTime64, etc.) based on the target database - Metadata Interpolation: Use metadata values from MySQL CDC events in sink configurations for dynamic routing and table naming
Supported Databases
Generic IO Sink supports the following databases:- BigQuery
- ClickHouse
- DuckDB
- MySQL
- MSSQL
- Postgres
Metadata Interpolation
You can use metadata values from MySQL CDC events in your sink configuration using Python string formatting syntax:{schema}: Database/schema name from the event{table}: Table name from the event{key_columns}: List of primary key column names (e.g.,["id"]or["user_id", "tenant_id"])
{key_columns} in unique_constraints, it will be automatically converted from a string representation to a list. The format supports both Python-style ("['id']") and JSON-style ('["id"]') array strings.
Example Configurations
How Metadata Interpolation Works
-
Message Grouping: Messages are automatically grouped by their interpolated config values. For example, if you use
table_name: "{schema}_{table}", messages frommydb.userswill be grouped together and written tomydb_userstable. -
Key Columns Interpolation: When using
{key_columns}inunique_constraints, the sink automatically:- Converts the list to a string representation during interpolation
- Parses it back to a list (supports both
"['id']"and'["id"]'formats) - Uses it for upsert operations based on
unique_conflict_method
users with primary key id, and you configure unique_constraints: "{key_columns}", it will automatically use ["id"] for upsert operations.
Best Practices
- Use GTID: Enable GTID for reliable resume capabilities
- Filter Events: Disable unnecessary event types for performance
- Monitor Resources: Watch memory and CPU usage
- Test Resume: Verify checkpoint functionality
- Secure Connections: Use SSL in production
- Regular Backups: Backup checkpoint files
- Schema Validation: Test with schema changes
- Use Generic IO Sink: Leverage automatic type mapping and metadata interpolation for flexible data routing
- Metadata Interpolation: Use
{schema},{table}, and{key_columns}for dynamic table routing and upsert configuration
Timestamp Handling
The MySQL CDC source automatically converts Unix timestamps to datetime objects:_mage_created_at: Set to event timestamp (datetime) for INSERT operations_mage_updated_at: Set to event timestamp (datetime) for UPDATE operations_mage_deleted_at: Set to event timestamp (datetime) for DELETE operations
- Proper type handling in downstream databases
- Automatic type conversion in Generic IO sink
- Consistent timezone handling across systems
Limitations
- MySQL 5.7+: Requires MySQL 5.7 or later
- ROW Format: Requires
binlog_format = ROW - Network Dependency: Requires stable network connection
- Memory Usage: Schema caching uses memory
- Binlog Retention: Depends on MySQL binlog retention settings
- Timezone: All timestamps are in UTC timezone