Only in Mage Pro.Try our fully managed solution to access this advanced feature.
Overview
The Microsoft SQL Server CDC streaming source captures real-time changes from SQL Server Change Data Capture (CDC) change tables and processes them into structured records. It uses SQL Server’s built-in CDC feature to poll for changes and provides comprehensive CDC capabilities with automatic schema tracking and primary key detection.Real-time CDC
Captures changes as they happen in SQL Server
Native CDC
Uses SQL Server’s built-in Change Data Capture feature
Primary Key Detection
Automatically identifies and extracts primary key columns
LSN Tracking
Tracks Log Sequence Numbers for reliable resume capabilities
Configuration
Required Parameters
| Parameter | Type | Description |
|---|---|---|
host | string | SQL Server hostname or IP address |
port | number | SQL Server port (default: 1433) |
database | string | Database name |
user | string | SQL Server username with CDC access |
password | string | SQL Server password |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
connect_timeout | number | 10 | Connection timeout in seconds |
driver | string | ”ODBC Driver 18 for SQL Server” | ODBC driver name |
authentication | string | - | Authentication method (e.g., “ActiveDirectoryPassword”) |
schema | string | dbo | Default schema name |
schemas | array | - | Include only these schemas |
tables | array | - | Include only these tables |
ignore_schemas | array | - | Exclude these schemas (supports wildcards like “sys*“) |
ignore_tables | array | - | Exclude these tables (supports wildcards like “temp_*“) |
start_lsn | string | - | Start LSN to begin reading from (hex string, e.g., ‘0x00000000000000000000’) |
capture_instance | string | - | Capture instance name (auto-detected if not provided) |
max_batch_size | number | 100 | Maximum events per batch |
flush_interval_seconds | number | 1.0 | Maximum time between batch flushes |
poll_interval_seconds | number | 1.0 | How often to poll for new changes |
return_db_records_only | boolean | true | Return only DB records with Mage timestamp columns |
Record Formats
DB Records Only Format
Whenreturn_db_records_only: true (default):
_mage_* timestamp columns are returned as datetime objects (ISO 8601 format strings when serialized) in UTC timezone, not Unix timestamps. This ensures proper type handling when writing to databases.
Event Types
Row Events (Core CDC)
Row Events (Core CDC)
- Insert (
operation: "INSERT"): INSERT operations (CDC operation code 2) - Update (
operation: "UPDATE"): UPDATE operations (CDC operation code 4, after image only) - Delete (
operation: "DELETE"): DELETE operations (CDC operation code 1)
Primary Key Detection
The source automatically detects and extracts primary key columns:- Schema Discovery: Queries
INFORMATION_SCHEMAfor table structure - Primary Key Detection: Identifies actual primary key columns from database constraints
- Caching: Stores schema and primary key information for performance
- Key Extraction: Extracts primary key column names from row events
key_columns is a list of column names (not values) that can be used for deduplication or upsert operations in downstream sinks.
Examples
Prerequisites
Enable CDC on Database
CDC must be enabled at the database level before it can be used:Enable CDC on Tables
After enabling CDC on the database, enable it for each table you want to monitor:@source_schema: Schema name (e.g., ‘dbo’)@source_name: Table name@role_name: Role name for CDC access control (NULL = no role restriction)@supports_net_changes: 0 = all changes, 1 = net changes only
Verify CDC on Tables
Check which tables have CDC enabled:SQL Server Agent
CDC requires SQL Server Agent to be running. Verify it’s running:User Permissions
The user needs the following permissions:Troubleshooting
Common Issues
Common Issues
- CDC Not Enabled on Database: Run
EXEC sys.sp_cdc_enable_db;on the database - CDC Not Enabled on Table: Run
sys.sp_cdc_enable_tablefor each table - SQL Server Agent Not Running: Start SQL Server Agent service
- Permission Denied: Ensure user has SELECT permissions on tables and CDC schema
- Connection Timeout: Increase
connect_timeoutvalue - LSN Format Error: Ensure LSN is in hex format (e.g., ‘0x00000000000000000000’)
- ODBC Driver Not Found: Install appropriate ODBC driver (e.g., “ODBC Driver 18 for SQL Server”)
- Max LSN Returns NULL: This usually means CDC is not enabled on the database
Debugging
Debugging
Check CDC status on database:Check CDC enabled tables:Check current max LSN:Check min LSN for a capture instance:View recent CDC changes:
Monitoring
Monitoring
Monitor key metrics:Monitor CDC job status:
- Batch Size: Average events per batch
- Flush Rate: How often batches are flushed
- Error Rate: Failed events or connections
- Lag: Time between event occurrence and processing
- LSN Progress: Track Log Sequence Number progress
Integration with Generic IO Sink
The SQL Server CDC source works seamlessly with the Generic IO sink, which provides:- Automatic Column Type Mapping: Automatically maps
_mage_*timestamp columns to appropriate database types (TIMESTAMP, DATETIME2, DateTime64, etc.) based on the target database - Metadata Interpolation: Use metadata values from SQL Server CDC events in sink configurations for dynamic routing and table naming
Supported Databases
Generic IO Sink supports the following databases:- BigQuery
- ClickHouse
- DuckDB
- MySQL
- MSSQL
- Postgres
Metadata Interpolation
You can use metadata values from SQL Server CDC events in your sink configuration using Python string formatting syntax:{schema}: Schema name from the event{table}: Table name from the event{key_columns}: List of primary key column names (e.g.,["id"]or["user_id", "tenant_id"])
{key_columns} in unique_constraints, it will be automatically converted from a string representation to a list. The format supports both Python-style ("['id']") and JSON-style ('["id"]') array strings.
Example Configurations
How Metadata Interpolation Works
-
Message Grouping: Messages are automatically grouped by their interpolated config values. For example, if you use
table_name: "{schema}_{table}", messages fromdbo.userswill be grouped together and written todbo_userstable. -
Key Columns Interpolation: When using
{key_columns}inunique_constraints, the sink automatically:- Converts the list to a string representation during interpolation
- Parses it back to a list (supports both
"['id']"and'["id"]'formats) - Uses it for upsert operations based on
unique_conflict_method
users with primary key id, and you configure unique_constraints: "{key_columns}", it will automatically use ["id"] for upsert operations.
Best Practices
- Enable CDC on Database First: Always enable CDC at the database level before enabling on tables
- Monitor SQL Server Agent: Ensure SQL Server Agent is running for CDC to work
- Filter Events: Use schema/table filters to reduce processing overhead
- Monitor Resources: Watch memory and CPU usage, especially with high-volume tables
- Test Resume: Verify checkpoint functionality with LSN tracking
- Secure Connections: Use encrypted connections in production
- Regular Backups: Backup checkpoint files
- Schema Validation: Test with schema changes
- Use Generic IO Sink: Leverage automatic type mapping and metadata interpolation for flexible data routing
- Metadata Interpolation: Use
{schema},{table}, and{key_columns}for dynamic table routing and upsert configuration - Poll Interval: Adjust
poll_interval_secondsbased on your latency requirements (lower = more frequent polling, higher CPU usage) - Batch Size: Adjust
max_batch_sizebased on your throughput needs (larger = fewer handler calls, more memory usage)
Timestamp Handling
The SQL Server CDC source automatically adds Mage timestamp columns:_mage_created_at: Set to event timestamp (datetime) for INSERT operations_mage_updated_at: Set to event timestamp (datetime) for UPDATE operations_mage_deleted_at: Set to event timestamp (datetime) for DELETE operations
- Proper type handling in downstream databases
- Automatic type conversion in Generic IO sink
- Consistent timezone handling across systems
Limitations
- SQL Server 2008+: Requires SQL Server 2008 or later (Enterprise, Developer, or Standard edition)
- CDC Feature: Requires CDC to be enabled on database and tables
- SQL Server Agent: Requires SQL Server Agent to be running
- Network Dependency: Requires stable network connection
- Memory Usage: Schema caching uses memory
- CDC Retention: Depends on SQL Server CDC retention settings (default: 3 days)
- Timezone: All timestamps are in UTC timezone
- Polling: Uses polling mechanism (not push-based like logical replication)
- UPDATE Events: Only includes after image, not before image
- ODBC Driver: Requires appropriate ODBC driver to be installed
LSN Format
SQL Server CDC uses Log Sequence Numbers (LSN) in binary format, which are converted to hex strings for storage and checkpointing:- Format:
0xfollowed by hexadecimal digits (e.g.,0x00000000000000000000) - Length: Typically 10 bytes (20 hex characters)
- Comparison: LSNs can be compared as integers for range queries
- Checkpoint: Last processed LSN is saved in checkpoint file for resume capability
Capture Instance Names
When CDC is enabled on a table, SQL Server automatically creates a capture instance. The naming convention is:- Format:
{schema}_{table}_CT - Example:
dbo_users_CTfor tableusersin schemadbo
capture_instance in the configuration if needed.