Change Data Capture (CDC)

1. What is Change Data Capture?

Change Data Capture (CDC) is the ability to identify and track changes made to data in a database. While there are a number of common patterns database administrators can employ to build custom CDC solutions, many database vendors provide embedded CDC features out of the box. This blog post will focus on Microsoft’s implementation of change data capture available across SQL Server and Azure SQL.

2. CDC in SQL Server and Azure SQL

Change Data Capture was initially introduced as a feature back in SQL Server 2008, and continues to be available today across supported versions of SQL Server, Azure SQL Managed Instance, and Azure SQL Database. Once enabled, the service monitors for DML changes (INSERT, UPDATE, and DELETE) on a CDC enabled table, then persists these modifications to a change table that mirrors the column structure of the tracked source table along with additional metadata columns, enabling downstream consumption by other users and systems (e.g. show me all rows that changed within a certain time period).

3. Process Flow

  1. DML operations (DELETE, INSERT, or UPDATE) occur against a source table in the database. This activity is written to the transaction log.

  2. Change data capture process monitors for changes against tracked source tables by periodically scanning the transaction log.

  3. Change data capture process persists a copy of these changes to the associated change table.

  4. Change data capture data is made available via the associated change data capture query functions.

4. Key Concepts

Capture Instance
When a table is enabled for change data capture, an associated capture instance is created. The capture instance consists of a change table (e.g. cdc.<capture instance>_CT) and up to two query functions (e.g. cdc.fn_cdc_get_all_changes_<capture_instance> and cdc.fn_cdc_get_net_changes_<capture_instance>).

Capture Process
The capture process monitors the transaction log by scanning for changes, then subsequently writing identified changes to the associated change table.

Change Table
When a table is enabled for change data capture, an associated change table is created (e.g. cdc.<capture_instance>_CT). The change table consists of columns that mirror the columns from the source table (e.g. CustomerID, CustomerAddress, etc), as well as additional metadata columns that provide additional information that is relevant to the recorded change.

Query Functions
While change tables can be queried directly (e.g. SELECT * FROM cdc.dbo_Customer_CT), system functions are made available which can provide a filtered result set. It is recommended you do not query the system tables directly. Instead, leverage the built-in query functions.

5. Working Example

The following snippets of SQL code walks through:

  1. How to enable CDC

  2. Implications of INSERT, DELETE, and UPDATE operations on the underlying change tables

  3. How to query for changes using the query functions

  4. How to disable CDC

Alternatively, the code samples are available for download in this Python Notebook which can be opened in tools such as Azure Data Studio.

Enable CDC
The first step is to enable change data capture, first at the database level, then followed by enabling change data capture for a specific target table.

-- 01. Create Table
DROP TABLE IF EXISTS dbo.Customer;
CREATE TABLE Customer
(
    CustomerID INT PRIMARY KEY,
    CustomerAddress VARCHAR(255)
);

-- 02. Enable change data capture for the current database
sys.sp_cdc_enable_db

-- 03. Enable change data capture for the specified source table
sys.sp_cdc_enable_table 
    @source_schema = N'dbo',
    @source_name   = N'Customer',
    @role_name     = NULL,
    @supports_net_changes = 1;

-- 04. As we have yet to trigger a DELETE/INSERT/UPDATE, the change table is initially empty
SELECT * FROM cdc.dbo_Customer_CT

At this point, the change table will have been initialised (albeit empty), until a change occurs.

Track Changes
Now that we have CDC enabled on the database and monitoring for changes on a specific source table, by performing a variety of DML operations (INSERT, DELETE, and UPDATE), we can see the way these changes are persisted within the associated change table.

-- 05. Populate the Customer table (INSERT)
INSERT INTO dbo.Customer
    (CustomerID, CustomerAddress)
VALUES
    (1, '82 Margate Drive, Sheffield, S4 8FQ'),
    (2, '135 High Barns, Ely, CB7 4RH'),
    (3, '39 Queen Annes Drive, Bedale, DL8 2EL');
SELECT * FROM dbo.Customer;

-- 06. The change table returns three rows in total, one row for each INSERT. 
-- The __$operation column identifies the DML operation associated with the change. 1 = delete, 2 = insert, 3 = update (old values), 4 = update (new values).
-- NOTE: There will be some latency between a transaction log being commited on a source table and the last captured transaction being committed on the change table.
-- If the records are not immediately available, run cell again.
SELECT * FROM cdc.dbo_Customer_CT

The initial load of data into the source table has resulted in three rows appearing in our associated change table, one for each INSERT.

-- 07. Remove a row from the Customer table (DELETE)
DELETE FROM dbo.Customer WHERE CustomerID = 2;
SELECT * FROM dbo.Customer;

-- 08. The change table returns four rows in total, one additional row for the DELETE.
-- The __$operation column identifies the DML operation associated with the change. 1 = delete, 2 = insert, 3 = update (old values), 4 = update (new values).
SELECT * FROM cdc.dbo_Customer_CT;

Similarly, only one additional row is inserted for the delete operation..

-- 09. Update a row from the Customer table (UPDATE)
UPDATE dbo.Customer SET CustomerAddress = 'Guyzance Cottage, Guyzance NE65 9AF' WHERE CustomerID = 3;
SELECT * FROM dbo.Customer;

-- 10. The change table returns six rows in total, two additional rows for the UPDATE (one for the old, one for the new).
-- The __$operation column identifies the DML operation associated with the change. 1 = delete, 2 = insert, 3 = update (old values), 4 = update (new values).
SELECT * FROM cdc.dbo_Customer_CT;

an update operation on the other hand results in two additional rows being added to the change table, one row to represent the old values, one ROW to represent the new values.

Query Changes
In addition to the change table, a change data capture instance includes query functions that makes it relatively easy to isolate specific changes. For example, show me all changes that occurred over the last day. It is recommended to use these query functions as opposed to querying the system change tables directly.

-- 11. It is recommended that you do not query the system tables (e.g. cdc._CT) directly. 
-- Function: cdc.fn_cdc_get_all_changes_
-- Description: Returns one row for each change applied to the source table within the specified log sequence number (LSN) range.
DECLARE @from_lsn binary(10), @to_lsn binary(10);  
SET @from_lsn = sys.fn_cdc_get_min_lsn('dbo_Customer');  
SET @to_lsn   = sys.fn_cdc_get_max_lsn();  
SELECT * FROM cdc.fn_cdc_get_all_changes_dbo_Customer (@from_lsn, @to_lsn, N'all');

Filtered result set from cdc.fn_CDC_get_all_changes_<capture instance>

-- 12. The following function will only be available if @supports_net_changes was activated when enabling the table for CDC (sys.sp_cdc_enable_table)
-- Function: cdc.fn_cdc_get_net_changes_
-- Description: Returns one net change row for each source row changed within the specified Log Sequence Numbers (LSN) range.
DECLARE @begin_time datetime, @end_time datetime, @from_lsn binary(10), @to_lsn binary(10); 
SET @begin_time = DATEADD(day, -1, GETDATE());
SET @end_time = GETDATE();
SET @from_lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @begin_time);  
SET @to_lsn = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @end_time);  
SELECT * FROM cdc.fn_cdc_get_net_changes_dbo_Customer(@from_lsn, @to_lsn, 'all');  

Filtered result set from cdc.fn_CDC_get_NET_changes_<capture instance>

Disable CDC
Finally, the steps below illustrate how to turn CDC off, both at a table level, and a database level.

-- 13. Disables change data capture for the specified source table and capture instance in the current database
sys.sp_cdc_disable_table
    @source_schema = N'dbo',   
    @source_name = N'Customer',  
    @capture_instance = N'dbo_Customer';

-- 14. Disables change data capture for the current database
sys.sp_cdc_disable_db;

-- 15. Drop Customer table if it exists
DROP TABLE IF EXISTS dbo.Customer;

6. Additional Resources

7. Reference

System Dynamic Management Views

Name Description
sys.dm_cdc_errorsReturns one row for each error encountered during the change data capture log scan session.
sys.dm_cdc_log_scan_sessionsReturns one row for each log scan session in the current database.

System Functions

Name Description
cdc.fn_cdc_get_all_changes_<capture_instance>Returns one row for each change applied to the source table within the specified LSN range.
cdc.fn_cdc_get_net_changes_<capture_instance>Returns one net change row for each source row changed within the specified LSN range.
sys.fn_cdc_decrement_lsnReturns the previous log sequence number (LSN) in the sequence based upon the specified LSN.
sys.fn_cdc_get_column_ordinalReturns the column ordinal as it appears in the change table.
sys.fn_cdc_get_max_lsnReturns the maximum LSN from the start_lsn column in the cdc.lsn_time_mapping system table.
sys.fn_cdc_get_min_lsnReturns the start_lsn column value for the specified capture instance.
sys.fn_cdc_has_column_changedIdentifies whether the specified update mask indicates that the column has been updated.
sys.fn_cdc_increment_lsnReturns the next log sequence number (LSN) in the sequence based upon the specified LSN.
sys.fn_cdc_is_bit_setIndicates whether a captured column has been updated.
sys.fn_cdc_map_lsn_to_timeReturns the date and time valuefor the specified log sequence number (LSN).
sys.fn_cdc_map_time_to_lsnReturns the log sequence number (LSN) value for the specified time.
sys.fn_all_changes_<capture_instance>Wrappers for the all changes query functions.
sys.fn_net_changes_<capture_instance>Wrappers for the net changes query functions.

System Stored Procedures

Name Description
sys.sp_cdc_add_jobCreates a change data capture cleanup or capture job in the current database.
sys.sp_cdc_change_jobModifies the configuration of a change data capture cleanup or capture job in the current database.
sys.sp_cdc_cleanup_change_tableRemoves rows from the change table in the current database based on the specified low_water_mark value.
sys.sp_cdc_disable_dbDisables change data capture for the current database.
sys.sp_cdc_disable_tableDisables change data capture for the specified source table and capture instance in the current database.
sys.sp_cdc_drop_jobRemoves a change data capture cleanup or capture job for the current database from msdb.
sys.sp_cdc_enable_dbEnables change data capture for the current database.
sys.sp_cdc_enable_tableEnables change data capture for the specified source table in the current database.
sys.sp_cdc_generate_wrapper_functionGenerates scripts to create wrapper functions for the CDC query functions that are available in SQL Server.
sys.sp_cdc_get_captured_columnsReturns CDC metadata for the captured source columns tracked by the specified capture instance.
sys.sp_cdc_get_ddl_historyReturns the DDL change history associated with the specified capture instance.
sys.sp_cdc_help_change_data_captureReturns the CDC configuration for each table enabled for change data capture in the current database.
sys.sp_cdc_help_jobsReports information about all change data capture cleanup or capture jobs in the current database.
sys.sp_cdc_scanExecutes the change data capture log scan operation.
sys.sp_cdc_start_jobStarts a change data capture cleanup or capture job for the current database.
sys.sp_cdc_stop_jobStops a change data capture cleanup or capture job for the current database.

System Tables

Name Description
cdc.<capture_instance>_CTIs the change table created when change data capture is enabled on a source table.
cdc.captured_columnsReturns one row for each column tracked in a capture instance.
cdc.change_tablesReturns one row for each change table in the database.
cdc.ddl_historyReturns one row for each data definition language (DDL) change made to tables that are enabled for change data capture.
cdc.lsn_time_mappingReturns one row for each transaction having rows in a change table.
cdc.index_columnsReturns one row for each index column associated with a change table.
cdc.cdc_jobsStores the change data capture configuration parameters for capture and cleanup jobs.