CDC Sink is a tool that enables logical data replication to CockroachDB, the SQL relational database with built-in high availability, horizontal scalability, multi-region geographic distribution, and fully-consistent transactional data.
Data Sources
CDC Sink works with a variety of data sources, including:
- another CockroachDB cluster
- PostgreSQL
- MySQL/MariaDB
- Google Cloud Firestore
This blog post focuses on using PostgreSQL as the source database.
Use Cases
One of the main use cases for CDC Sink is to support low-downtime migration. A low-downtime migration is one in which the allowable scheduled downtime is too short to take a full backup or data dump from the source database and load the data in its entirety onto the target CockroachDB cluster.
Instead, a low-downtime data migration can happen with the following steps:
- Transfer a backup or data dump from the source database to the target CockroachDB cluster, while the source database remains online serving business traffic.
- Start logical replication from the source database to the target CockroachDB cluster.
- Start the scheduled downtime. During the downtime:
- Let the logical replication pipeline drain (which typically takes seconds to a few minutes).
- Point the application at the target CockroachDB cluster.
- End the scheduled downtime.
PostgreSQL as source database – steps
The CDC Sink documentation has a section on using PostgreSQL as the source database. The goal of this blog post is to build on that, to show a step-by-step demo of how to do it. This uses a MacBook laptop.
Prerequisites – Installation
Starting with a MacBook laptop, you need to install four items:
The easiest way to do this is to install a fifth item, a package manager for macOS:
[5.] Homebrew
Each of the above links points to instructions on how to install the relevant component. Make sure all of these are in your command path.
Follow along
The steps below are documented in a Google Sheets spreadsheet, so you can follow along on your own laptop (I’m using a MacBook) if you wish.
Start PostgreSQL
Run the following commands to initialize a PostgreSQL database, start a PostgreSQL server running, and start a command-line PostgreSQL SQL client:
initdb demodatadir
postgres -D demodatadir > /tmp/postgres_output1 2>&1 &
psql postgres
Create some data
In the psql
client, enter the following to:
- list a few things (users, databases, and tables)
- create a demo database called
demodb
- exit
psql
\du
\l
\d
create database demodb;
\q
Now use psql
again to connect to the demodb
database that we just created:
psql demodb
In the psql
client:
- create a test table called
t
- give it an index
- store a few rows of data
- exit
psql
create table t (pk uuid default gen_random_uuid() primary key, fname varchar(100), lname varchar(100), age integer);
create index t_fname_idx on t(fname);
insert into t (fname, lname, age) values ('Joe', 'Swift', 23), ('Carol', 'Wright', 67), ('Beth', 'Russel', 9), ('Randy', 'Byrd', 15);
\q
Configure PostgreSQL write-ahead log level to logical
PostgreSQL uses a write-ahead log (WAL) that can be configured to different log level settings. The log level must be configured to logical
in order to work with CDC Sink. This requires:
- editing the PostgreSQL configuration file
- restarting PostgreSQL to pick up the changed setting
Documentation on setting the WAL is at:
- Write Ahead Log – https://www.postgresql.org/docs/current/runtime-config-wal.html
- Logical Replication – https://www.postgresql.org/docs/current/logical-replication.html
- Setting Parameters – Parameter Interaction via the Configuration File – https://www.postgresql.org/docs/current/config-setting.html#CONFIG-SETTING-CONFIGURATION-FILE
To change the WAL level to logical
, follow these steps:
Make a copy of the original PostgreSQL configuration file:
cp -i demodatadir/postgresql.conf demodatadir/postgresql_ORIGINAL.conf_ORIGINAL
Next, edit the configuration file using your favorite text editor. For example, if you use the vi
text editor, you can enter:
vi demodatadir/postgresql.conf
Look for the line in the configuration file that has the string wal_level
in it. It will be commented out with a #
character, and will look like this:
#wal_level = replica # minimal, replica, or logical
Uncomment this line and change replica
to logical
. It will now look like this:
wal_level = logical # minimal, replica, or logical
Save the configuration file, then:
- restart PostgreSQL
- reconnect to PostgreSQL using the
psql
SQL client:
ps aux | grep post
pg_ctl stop -D demodatadir
ps aux | grep post
postgres -D demodatadir > /tmp/postgres_output2 2>&1 &
ps aux | grep post
psql demodb
Create PostgreSQL publication and logical replication slot
Next, tell PostgreSQL which tables will be replicated by creating two things:
- a publication
- a logical replication slot
To do so, issue the following in the psql
client:
\du
\l
\d
select * from t;
-- list publications
select * from pg_catalog.pg_publication;
-- create publication
CREATE PUBLICATION my_pub FOR ALL TABLES;
-- list publications
select * from pg_catalog.pg_publication;
-- list tables in publications
select * from pg_publication_tables;
-- create logical replication slot
SELECT pg_create_logical_replication_slot('cdc_sink', 'pgoutput');
Create PostgreSQL transaction snapshot
Next, continue using the psql
client to create a transaction snapshot and return its identifier as a text string. We will use this identifier in a future step when we export the data from PostgreSQL.
IMPORTANT: You must leave this psql
session open after you run the statement shown below, because the transaction snapshot identifier has a lifetime only as long as the session that created it.
-- create a consistent point for bulk data export
BEGIN;
SELECT pg_export_snapshot();
This will look something like:
demodb=# begin;
BEGIN
demodb=*# SELECT pg_export_snapshot();
pg_export_snapshot
---------------------
00000003-0000000D-1
(1 row)
In this example, the transaction snapshot identifier is 00000003-0000000D-1
.
REMEMBER, LEAVE THIS SESSION AS-IS FOR NOW — DO NOT COMMIT OR ROLLBACK OR EXIT THE PSQL
CLIENT.
Create PostgreSQL data dump
Next, we will use the pg_dump
command to dump the contents of PostgreSQL into a data file. Replace the value you obtained for the transaction snapshot identifier in the –snapshot parameter:
# replace value for --snapshot below with the value from SELECT pg_export_snapshot();
pg_dump -d demodb --snapshot '00000003-0000000C-1' > pg_dump_at_snapshot
At this point you can go back to the psql session that you left open and commit the transaction:
commit;
We have finished working on the PostgreSQL side. Now it is time to set up CockroachDB.
Start CockroachDB
We will start CockroachDB in single-node mode on our Mac laptop. Then we will start the cockroach sql
command-line client:
ps aux | grep -i cockroach
cockroach start-single-node --insecure --background;
ps aux | grep -i cockroach
cockroach sql --insecure
Create CockroachDB destination database
Next, create a demodb
database in CockroachDB to match the demodb
database in PostgreSQL. In the cockroach sql
client we just started, enter the following:
create database demodb;
use demodb;
Import bulk data from PostgreSQL to CockroachDB
Next we will import the data into CockroachDB that we exported from PostgreSQL. The first step is to create a directory for the data file in a location that CockroachDB can find. To do this, enter in a shell window:
ls -l cockroach-data/extern
mkdir cockroach-data/extern
The extern
directory is a specific directory within the CockroachDB data store directory that CockroachDB uses to reference external files.
Next, copy the PostgreSQL dump file to this new directory:
cp -i pg_dump_at_snapshot cockroach-data/extern
ls -l cockroach-data/extern
Next, we will need to edit the PostgreSQL dump file to comment-out a line that is not compatible with CockroachDB. To do this:
- Edit the file with your text editor
- Look for the line that contains
ALTER PUBLICATION
- Either delete this line, or you can comment it out by adding two dashes at the start of the line
- Save the file
Now the data file is ready to import into CockroachDB. In the cockroach sql
client that we started previously, run the following:
SHOW TABLES;
IMPORT PGDUMP 'nodelocal://1/pg_dump_at_snapshot' WITH ignore_unsupported_statements;
SHOW TABLES;
SELECT * FROM t;
After the import, the SELECT
statement should show the same data that we saw in PostgreSQL:
root@localhost:26257/demodb> SELECT * FROM t;
pk | fname | lname | age
---------------------------------------+-------+--------+------
61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright | 67
6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe | Swift | 23
bc01fe61-e095-484e-a512-62fb1f4d37fe | Randy | Byrd | 15
f22fda33-4933-4f3f-83fd-27d9c71b7d8f | Beth | Russel | 9
(4 rows)
Prepare CockroachDB to use CDC Sink
CDC Sink requires that the destination CockroachDB cluster have a database in it called _cdc_sink
. Let’s create it now, using our currently-running session in the cockroach sql
client:
CREATE DATABASE IF NOT EXISTS _cdc_sink;
Start CDC Sink to begin replication
We are now ready to start replication from PostgreSQL to CockroachDB! In shell window, run the following command.
Note: Substitute your own userid for USERID
in the JDBC URL in this command:
cdc-sink pglogical --publicationName=my_pub --sourceConn='postgresql://USERID@localhost:5432/demodb?sslmode=disable' --targetConn='postgresql://root@localhost:26257/?sslmode=disable' --targetDB=demodb --verbose=7
Show logical replication in action
To see logical replication working, we will make changes on the PostgreSQL side and observe those changes on the CockroachDB side.
Demonstrate UPDATE replication
In the PostgreSQL psql client, enter an UPDATE:
demodb=# select * from t;
pk | fname | lname | age
--------------------------------------+-------+--------+-----
6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe | Swift | 23
61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright | 67
f22fda33-4933-4f3f-83fd-27d9c71b7d8f | Beth | Russel | 9
bc01fe61-e095-484e-a512-62fb1f4d37fe | Randy | Byrd | 15
(4 rows)
demodb=# update t set age=45 where fname='Joe';
UPDATE 1
demodb=# select * from t;
pk | fname | lname | age
--------------------------------------+-------+--------+-----
61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright | 67
f22fda33-4933-4f3f-83fd-27d9c71b7d8f | Beth | Russel | 9
bc01fe61-e095-484e-a512-62fb1f4d37fe | Randy | Byrd | 15
6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe | Swift | 45
(4 rows)
Now in the CockroachDB cockroach sql client, observe that the change has replicated:
root@localhost:26257/demodb> select * from t;
pk | fname | lname | age
---------------------------------------+-------+--------+------
61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright | 67
6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe | Swift | 45
bc01fe61-e095-484e-a512-62fb1f4d37fe | Randy | Byrd | 15
f22fda33-4933-4f3f-83fd-27d9c71b7d8f | Beth | Russel | 9
(4 rows)
Demonstrate INSERT replication
In the PostgreSQL psql client, enter an INSERT:
demodb=# insert into t (fname, lname, age) values ('Bob', 'Barker', 99);
INSERT 0 1
demodb=# select * from t;
pk | fname | lname | age
--------------------------------------+-------+--------+-----
61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright | 67
f22fda33-4933-4f3f-83fd-27d9c71b7d8f | Beth | Russel | 9
bc01fe61-e095-484e-a512-62fb1f4d37fe | Randy | Byrd | 15
6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe | Swift | 45
76f47970-2a82-4c7d-afd0-500c3eed0c73 | Bob | Barker | 99
(5 rows)
Now in the CockroachDB cockroach sql client, observe that the change has replicated:
root@localhost:26257/demodb> select * from t;
pk | fname | lname | age
---------------------------------------+-------+--------+------
61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright | 67
6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe | Swift | 45
76f47970-2a82-4c7d-afd0-500c3eed0c73 | Bob | Barker | 99
bc01fe61-e095-484e-a512-62fb1f4d37fe | Randy | Byrd | 15
f22fda33-4933-4f3f-83fd-27d9c71b7d8f | Beth | Russel | 9
(5 rows)
Demonstrate DELETE replication
In the PostgreSQL psql client, enter a DELETE:
demodb=# delete from t where age<20;
DELETE 2
demodb=# select * from t;
pk | fname | lname | age
--------------------------------------+-------+--------+-----
61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright | 67
6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe | Swift | 45
76f47970-2a82-4c7d-afd0-500c3eed0c73 | Bob | Barker | 99
(3 rows)
Now in the CockroachDB cockroach sql client, observe that the change has replicated:
root@localhost:26257/demodb> select * from t;
pk | fname | lname | age
---------------------------------------+-------+--------+------
61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright | 67
6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe | Swift | 45
76f47970-2a82-4c7d-afd0-500c3eed0c73 | Bob | Barker | 99
(3 rows)
Congratulations! This shows that logical replication is working!
Cleanup
Now let’s cleanly shut everything down.
Close the CockroachDB cockroach sql
client:
\q
Exit CDC Sink by entering CTRL-C in the shell window that is running CDC Sink:
^C
Clean up the replication setup in PostgreSQL, then exit the psql
client. In the psql client, run the following:
SELECT pg_drop_replication_slot('cdc_sink');
DROP PUBLICATION my_pub;
select * from pg_catalog.pg_publication;
\q
Now stop the PostgreSQL server. In a shell window, run:
ps aux | grep post
pg_ctl stop -D demodatadir
ps aux | grep post
Lastly, stop the CockroachDB server. In a shell window, run the following command to determine the process ID (PID) of the CockroachDB server:
ps aux | grep -i cockroach
Next, use this process ID in the kill
command to terminate the CockroachDB server:
# substitute PID below
kill 99999
ps aux | grep -i cockroach
Closing
CDC Sink is a powerful tool that can be part of a strategy for low-downtime data migration into CockroachDB. We used CDC Sink to demonstrate how to replicate data from PostgreSQL to CockroachDB.