Incremental syncing from databases

When bulk-syncing (i.e. ETL or ELT) from databases or other data warehouses into your own data warehouse or other storage, you can avoid having Polytomic do full table scans to get updates by utilising one of two methods:

  • CDC (Change Data Capture) streaming.
  • Setting tracking fields.

CDC streaming

CDC streaming is a replication method that involves incrementally digesting a real-time stream of changes from your database. This is a low-latency operation that avoids doing any table scans.

Polytomic supports CDC streaming from PostgreSQL and MySQL.

Tracking fields


Tracking fields do not propagate deletes

Setting tracking fields means that deletes will not be propagated to your destination warehouse/database/cloud storage bucket.

Sometimes your database or data warehouse source does not support CDC streaming. In this case you can still sync tables incrementally if they have a column with monotonically-increasing values that indicate a change to the row has occurred (this will commonly be a datetime updated_at column). Setting these will cause Polytomic's queries to filter on the last successful datetime which will result in much cheaper queries when obtaining updates.

Setting tracking fields

Tracking fields for each table can be set using the ...icon in your Bulk Sync table configuration. Polytomic allows you to set tracking fields for a partial list of tables, recognising that not all your tables may contain suitable columns.

The first run after setting tracking fields will be a full scan in order for Polytomic to compute some state for future runs. Subsequent runs will be incremental using the set tracking fields.

Setting tracking fields on an already-running sync

Polytomic allows the setting of tracking fields for an already-running sync. For example, perhaps your tables did not have suitable tracking columns when you first created your sync but do now.

The behaviour when doing so is no different: the first sync after setting tracking fields will run full scans (you can trigger this yourself by clicking 'Sync now') and subsequent ones will be incremental. There is no need to click 'Force full resync' to activate tracking fields.