Overview
The MySQL CDC streaming source captures real-time changes from MySQL binary logs and processes them into structured records. It provides comprehensive CDC capabilities with automatic schema evolution tracking and primary key detection.Real-time CDC
Captures changes as they happen in MySQL
Schema Evolution
Tracks table structure changes automatically
Primary Key Detection
Automatically identifies and extracts primary key columns
Error Recovery
Handles server ID conflicts and binlog file issues
Configuration
Required Parameters
Parameter | Type | Description |
---|---|---|
host | string | MySQL server hostname or IP address |
port | number | MySQL server port (default: 3306) |
user | string | MySQL username with replication privileges |
password | string | MySQL password |
Optional Parameters
Parameter | Type | Default | Description |
---|---|---|---|
connect_timeout | number | 10 | Connection timeout in seconds |
charset | string | utf8mb4 | Character set for connection |
databases | array | - | Include only these databases |
tables | array | - | Include only these tables |
ignore_databases | array | - | Exclude these databases |
ignore_tables | array | - | Exclude these tables |
server_id | number | 54001 | Unique server ID for replication |
auto_generate_server_id | boolean | true | Auto-generate unique server_id if conflict |
resume_from_gtids | string | - | Resume from specific GTID set |
start_log_file | string | - | Resume from specific binlog file |
start_log_pos | number | - | Resume from specific position in binlog file |
start_timestamp | number | - | Resume from specific timestamp (Unix timestamp) |
include_ddl | boolean | false | Include DDL events (CREATE, ALTER, DROP, etc.) |
include_heartbeat | boolean | true | Include heartbeat events |
include_transaction_events | boolean | true | Include transaction commit events |
include_gtid_events | boolean | true | Include GTID events |
include_rotate_events | boolean | false | Include binlog rotation events |
include_intvar_events | boolean | true | Include integer variable events |
include_load_query_events | boolean | true | Include LOAD DATA INFILE events |
include_table_map_events | boolean | true | Include table map events (required for column mapping) |
heartbeat_seconds | number | 1.0 | Heartbeat interval in seconds |
blocking | boolean | true | Use blocking mode for event reading |
max_batch_size | number | 100 | Maximum events per batch |
flush_interval_seconds | number | 1.0 | Maximum time between batch flushes |
return_db_records_only | boolean | true | Return only DB records with Mage timestamp columns |
SSL Configuration
Parameter | Type | Description |
---|---|---|
ssl.ca | string | Certificate Authority file path |
ssl.cert | string | Client certificate file path |
ssl.key | string | Client private key file path |
Record Formats
Standard Event Format
Whenreturn_db_records_only: false
:
DB Records Only Format
Whenreturn_db_records_only: true
:
Event Types
Row Events (Core CDC)
Row Events (Core CDC)
- WriteRowsEvent (
op: "c"
): INSERT operations - UpdateRowsEvent (
op: "u"
): UPDATE operations with before/after data - DeleteRowsEvent (
op: "d"
): DELETE operations
DDL Events
DDL Events
- QueryEvent (
op: "ddl"
): DDL statements (CREATE, ALTER, DROP) - Automatically clears table schema cache when DDL occurs
- Extracts table name from SQL for targeted cache clearing
Transaction Events
Transaction Events
- XidEvent (
op: "transaction"
): Transaction commit events - Useful for maintaining transaction boundaries
Replication Events
Replication Events
- GtidEvent (
op: "gtid"
): Global Transaction Identifier events - RotateEvent (
op: "rotate"
): Binlog file rotation events - HeartbeatLogEvent (
op: "heartbeat"
): Replication heartbeat events
Metadata Events
Metadata Events
- TableMapEvent (
op: "table_map"
): Table structure metadata - IntvarEvent (
op: "intvar"
): Integer variable changes - LoadQueryEvent (
op: "begin_load_query"
/op: "execute_load_query"
): LOAD DATA INFILE events
Primary Key Detection
The source automatically detects and extracts primary key columns:- Schema Discovery: Queries
INFORMATION_SCHEMA
for table structure - Primary Key Detection: Identifies actual primary key columns from database constraints
- Caching: Stores schema and primary key information for performance
- Key Extraction: Extracts primary key values from row events
Examples
Prerequisites
MySQL Server Configuration
User Permissions
Troubleshooting
Common Issues
Common Issues
- Server ID Conflict: Enable
auto_generate_server_id: true
- Binlog File Not Found: Check if binlog file exists and is accessible
- Permission Denied: Ensure user has replication privileges
- Connection Timeout: Increase
connect_timeout
value - Schema Not Found: Ensure
include_table_map_events: true
Debugging
Debugging
Enable debug logging to see detailed event processing:
Monitoring
Monitoring
Monitor key metrics:
- Batch Size: Average events per batch
- Flush Rate: How often batches are flushed
- Error Rate: Failed events or connections
- Lag: Time between event occurrence and processing
Best Practices
- Use GTID: Enable GTID for reliable resume capabilities
- Filter Events: Disable unnecessary event types for performance
- Monitor Resources: Watch memory and CPU usage
- Test Resume: Verify checkpoint functionality
- Secure Connections: Use SSL in production
- Regular Backups: Backup checkpoint files
- Schema Validation: Test with schema changes
Limitations
- MySQL 5.7+: Requires MySQL 5.7 or later
- ROW Format: Requires
binlog_format = ROW
- Network Dependency: Requires stable network connection
- Memory Usage: Schema caching uses memory
- Binlog Retention: Depends on MySQL binlog retention settings