CDC replication

Stream changes from Microsoft SQL Server databases to data warehouses, other databases, and cloud storage buckets like S3.

📘

CDC replication only for Bulk Syncs

CDC replication from Microsoft SQL Server is only available for Bulk Syncs.

When creating a Polytomic Bulk Sync from Microsoft SQL Server into your data warehouse or other storage systems, it's preferred (though not required) for Polytomic to utilize CDC (change data capture) replication. This will avoid Polytomic running full table scans to figure out changes since the last sync. Instead, Polytomic will be capture changes from your database without scanning its tables.

Requirements

To enable this, Polytomic requires the following to be set on your Microsoft SQL Server database:

  • Enable CDC on your database
    Run this query on your Microsoft SQL Server database:

    EXEC sys.sp_cdc_enable_db;
    

    See Microsoft's documentation for more details.

  • Enable CDC for each table you want to sync

    Polytomic expects exactly one change capture enabled per table. Run this query per table you want to enable CDC for (substitute SCHEMA_NAME, TABLE_NAME, and USER_NAME with your values):

    EXEC sys.sp_cdc_enable_table @source_schema = SCHEMA_NAME, @source_name = TABLE_NAME, @role_name = USER_NAME, @supports_net_changes = 0;
    

    sp_cdc_enable_table enables a CDC table for the specified source table and its columns at that point in time.

    If the schema for the table changes then the CDC table will need to be removed and re-added with the new columns by following these steps:

    Find the capture instance:

    EXEC sys.sp_cdc_help_change_data_capture @source_schema = 'SCHEMA_NAME', @source_name = 'TABLE_NAME';
    

    Remove the capture instance:

    EXEC sys.sp_cdc_disable_table @source_schema = 'SCHEMA_NAME', @source_name   = 'TABLE_NAME', @capture_instance = 'CAPTURE_INSTANCE_NAME;
    

    Add the capture instance again:

    EXEC sys.sp_cdc_enable_table @source_schema = SCHEMA_NAME, @source_name = TABLE_NAME, @role_name = admin, @supports_net_changes = 0;
    

    A resync will also be required in Polytomic.

    Polytomic will detect if there are enabled columns in the sync that are not present in the CDC table and alert via a sync warning.

    See Microsoft's documentation for details.

Set Polytomic to read from your CDC tables

Once the previous steps are done, be sure to turn on the Use change data capture for bulk syncs setting on your Microsoft SQL Server connection in Polytomic:

Advanced settings

Propagating deletes

Polytomic offers two options for propagating deletes during CDC replication:

  • Soft deletes
  • Hard deletes

By default, Polytomic will propagate deletes from your source as soft-deletes by marking the __polytomic_deleted_at column for each destination table with a datetime reflecting deletion time.

You can change this behaviour to propagating deletes as hard deletes instead. To do so, simply check the Propagate hard deletes instead of soft deletes checkbox in the Advanced settings option at the bottom of your sync configuration.