When bulk-syncing (i.e. ETL or ELT) from databases or other data warehouses into your own data warehouse or other storage, you can avoid having Polytomic do full table scans to get updates by utilising one of two methods:
- CDC (Change Data Capture) streaming.
- Setting tracking fields.
CDC streaming is a replication method that involves incrementally digesting real-time stream of changes from your database. This is a low-latency operation that avoids doing any table scans.
Sometimes your database or data warehouse source does not support CDC streaming. In this case you can still sync tables incrementally if they have a column with monotonically-increasing values that indicate a change to the row has occurred (this will commonly be a datetime
updated_at column). Setting these will cause Polytomic's queries to filter on the last successful datetime which will result in much cheaper queries when obtaining updates.
Tracking fields for each table can be set using the
...icon in your Bulk Sync table configuration. Polytomic allows you to set tracking fields for a partial list of tables, recognising that not all your tables may contain suitable columns.
The first run after setting tracking fields will be a full scan in order for Polytomic to compute some state for future runs. Subsequent runs will be incremental using the set tracking fields.
Setting tracking fields for an already-running sync
Polytomic allows the setting of tracking fields for an already-running sync. For example, perhaps your tables did not having suitable tracking columns when you first created your sync but do now.
The behaviour when doing so is no different: the first sync after setting tracking fields will run full scans (you can trigger this yourself by clicking 'Sync now') and subsequent ones will be incremental. There is no need to click 'Force full resync' to activate tracking fields.
Updated 11 days ago