CDC Sink Replication from PostgreSQL to CockroachDB

CDC Sink is a tool that enables logical data replication to CockroachDB, the SQL relational database with built-in high availability, horizontal scalability, multi-region geographic distribution, and fully-consistent transactional data.

Data Sources

CDC Sink works with a variety of data sources, including:

another CockroachDB cluster
PostgreSQL
MySQL/MariaDB
Google Cloud Firestore

This blog post focuses on using PostgreSQL as the source database.

Use Cases

One of the main use cases for CDC Sink is to support low-downtime migration. A low-downtime migration is one in which the allowable scheduled downtime is too short to take a full backup or data dump from the source database and load the data in its entirety onto the target CockroachDB cluster.

Instead, a low-downtime data migration can happen with the following steps:

Transfer a backup or data dump from the source database to the target CockroachDB cluster, while the source database remains online serving business traffic.
Start logical replication from the source database to the target CockroachDB cluster.
Start the scheduled downtime. During the downtime:
- Let the logical replication pipeline drain (which typically takes seconds to a few minutes).
- Point the application at the target CockroachDB cluster.
End the scheduled downtime.

PostgreSQL as source database – steps

The CDC Sink documentation has a section on using PostgreSQL as the source database. The goal of this blog post is to build on that, to show a step-by-step demo of how to do it. This uses a MacBook laptop.

Prerequisites – Installation

Starting with a MacBook laptop, you need to install four items:

The easiest way to do this is to install a fifth item, a package manager for macOS:

[5.] Homebrew

Each of the above links points to instructions on how to install the relevant component. Make sure all of these are in your command path.

Follow along

The steps below are documented in a Google Sheets spreadsheet, so you can follow along on your own laptop (I’m using a MacBook) if you wish.

Start PostgreSQL

Run the following commands to initialize a PostgreSQL database, start a PostgreSQL server running, and start a command-line PostgreSQL SQL client:

initdb demodatadir
postgres -D demodatadir > /tmp/postgres_output1 2>&1 &
psql postgres

Create some data

In the psql client, enter the following to:

list a few things (users, databases, and tables)
create a demo database called demodb
exit psql

\du
\l
\d
create database demodb;
\q

Now use psql again to connect to the demodb database that we just created:

psql demodb

In the psql client:

create a test table called t
give it an index
store a few rows of data
exit psql

create table t (pk uuid default gen_random_uuid() primary key, fname varchar(100), lname varchar(100), age integer);
create index t_fname_idx on t(fname);
insert into t (fname, lname, age) values ('Joe', 'Swift', 23), ('Carol', 'Wright', 67), ('Beth', 'Russel', 9), ('Randy', 'Byrd', 15);
\q

Configure PostgreSQL write-ahead log level to logical

PostgreSQL uses a write-ahead log (WAL) that can be configured to different log level settings. The log level must be configured to logical in order to work with CDC Sink. This requires:

editing the PostgreSQL configuration file
restarting PostgreSQL to pick up the changed setting

Documentation on setting the WAL is at:

Write Ahead Log – https://www.postgresql.org/docs/current/runtime-config-wal.html
Logical Replication – https://www.postgresql.org/docs/current/logical-replication.html
Setting Parameters – Parameter Interaction via the Configuration File – https://www.postgresql.org/docs/current/config-setting.html#CONFIG-SETTING-CONFIGURATION-FILE

To change the WAL level to logical, follow these steps:

Make a copy of the original PostgreSQL configuration file:

cp -i demodatadir/postgresql.conf demodatadir/postgresql_ORIGINAL.conf_ORIGINAL

Next, edit the configuration file using your favorite text editor. For example, if you use the vi text editor, you can enter:

vi demodatadir/postgresql.conf

Look for the line in the configuration file that has the string wal_level in it. It will be commented out with a # character, and will look like this:

#wal_level = replica                    # minimal, replica, or logical

Uncomment this line and change replica to logical. It will now look like this:

wal_level = logical                     # minimal, replica, or logical

Save the configuration file, then:

restart PostgreSQL
reconnect to PostgreSQL using the psql SQL client:

ps aux | grep post
pg_ctl stop -D demodatadir
ps aux | grep post
postgres -D demodatadir > /tmp/postgres_output2 2>&1 &
ps aux | grep post
psql demodb

Create PostgreSQL publication and logical replication slot

Next, tell PostgreSQL which tables will be replicated by creating two things:

a publication
a logical replication slot

To do so, issue the following in the psql client:

\du
\l
\d
select * from t;
-- list publications
select * from pg_catalog.pg_publication;
-- create publication
CREATE PUBLICATION my_pub FOR ALL TABLES;
-- list publications
select * from pg_catalog.pg_publication;
-- list tables in publications
select * from pg_publication_tables;
-- create logical replication slot
SELECT pg_create_logical_replication_slot('cdc_sink', 'pgoutput');

Create PostgreSQL transaction snapshot

Next, continue using the psql client to create a transaction snapshot and return its identifier as a text string. We will use this identifier in a future step when we export the data from PostgreSQL.

IMPORTANT: You must leave this psql session open after you run the statement shown below, because the transaction snapshot identifier has a lifetime only as long as the session that created it.

-- create a consistent point for bulk data export
BEGIN;
SELECT pg_export_snapshot();

This will look something like:

demodb=# begin;
BEGIN
demodb=*# SELECT pg_export_snapshot();
 pg_export_snapshot  
---------------------
 00000003-0000000D-1
(1 row)

In this example, the transaction snapshot identifier is 00000003-0000000D-1.

REMEMBER, LEAVE THIS SESSION AS-IS FOR NOW — DO NOT COMMIT OR ROLLBACK OR EXIT THE PSQL CLIENT.

Create PostgreSQL data dump

Next, we will use the pg_dump command to dump the contents of PostgreSQL into a data file. Replace the value you obtained for the transaction snapshot identifier in the –snapshot parameter:

# replace value for --snapshot below with the value from SELECT pg_export_snapshot();
pg_dump -d demodb --snapshot '00000003-0000000C-1' > pg_dump_at_snapshot

At this point you can go back to the psql session that you left open and commit the transaction:

commit;

We have finished working on the PostgreSQL side. Now it is time to set up CockroachDB.

Start CockroachDB

We will start CockroachDB in single-node mode on our Mac laptop. Then we will start the cockroach sql command-line client:

ps aux | grep -i cockroach
cockroach start-single-node --insecure --background;
ps aux | grep -i cockroach
cockroach sql --insecure

Create CockroachDB destination database

Next, create a demodb database in CockroachDB to match the demodb database in PostgreSQL. In the cockroach sql client we just started, enter the following:

create database demodb;
use demodb;

Import bulk data from PostgreSQL to CockroachDB

Next we will import the data into CockroachDB that we exported from PostgreSQL. The first step is to create a directory for the data file in a location that CockroachDB can find. To do this, enter in a shell window:

ls -l cockroach-data/extern
mkdir cockroach-data/extern

The extern directory is a specific directory within the CockroachDB data store directory that CockroachDB uses to reference external files.

Next, copy the PostgreSQL dump file to this new directory:

cp -i pg_dump_at_snapshot cockroach-data/extern
ls -l cockroach-data/extern

Next, we will need to edit the PostgreSQL dump file to comment-out a line that is not compatible with CockroachDB. To do this:

Edit the file with your text editor
Look for the line that contains ALTER PUBLICATION
Either delete this line, or you can comment it out by adding two dashes at the start of the line
Save the file

Now the data file is ready to import into CockroachDB. In the cockroach sql client that we started previously, run the following:

SHOW TABLES;
IMPORT PGDUMP 'nodelocal://1/pg_dump_at_snapshot' WITH ignore_unsupported_statements;
SHOW TABLES;
SELECT * FROM t;

After the import, the SELECT statement should show the same data that we saw in PostgreSQL:

root@localhost:26257/demodb> SELECT * FROM t;
                   pk                  | fname | lname  | age
---------------------------------------+-------+--------+------
  61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright |  67
  6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe   | Swift  |  23
  bc01fe61-e095-484e-a512-62fb1f4d37fe | Randy | Byrd   |  15
  f22fda33-4933-4f3f-83fd-27d9c71b7d8f | Beth  | Russel |   9
(4 rows)

Prepare CockroachDB to use CDC Sink

CDC Sink requires that the destination CockroachDB cluster have a database in it called _cdc_sink. Let’s create it now, using our currently-running session in the cockroach sql client:

CREATE DATABASE IF NOT EXISTS _cdc_sink;

Start CDC Sink to begin replication

We are now ready to start replication from PostgreSQL to CockroachDB! In shell window, run the following command.

Note: Substitute your own userid for USERID in the JDBC URL in this command:

cdc-sink pglogical --publicationName=my_pub --sourceConn='postgresql://USERID@localhost:5432/demodb?sslmode=disable' --targetConn='postgresql://root@localhost:26257/?sslmode=disable' --targetDB=demodb  --verbose=7

Show logical replication in action

To see logical replication working, we will make changes on the PostgreSQL side and observe those changes on the CockroachDB side.

Demonstrate UPDATE replication

In the PostgreSQL psql client, enter an UPDATE:

demodb=# select * from t;
                  pk                  | fname | lname  | age 
--------------------------------------+-------+--------+-----
 6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe   | Swift  |  23
 61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright |  67
 f22fda33-4933-4f3f-83fd-27d9c71b7d8f | Beth  | Russel |   9
 bc01fe61-e095-484e-a512-62fb1f4d37fe | Randy | Byrd   |  15
(4 rows)

demodb=# update t set age=45 where fname='Joe';
UPDATE 1
demodb=# select * from t;
                  pk                  | fname | lname  | age 
--------------------------------------+-------+--------+-----
 61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright |  67
 f22fda33-4933-4f3f-83fd-27d9c71b7d8f | Beth  | Russel |   9
 bc01fe61-e095-484e-a512-62fb1f4d37fe | Randy | Byrd   |  15
 6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe   | Swift  |  45
(4 rows)

Now in the CockroachDB cockroach sql client, observe that the change has replicated:

root@localhost:26257/demodb> select * from t;
                   pk                  | fname | lname  | age
---------------------------------------+-------+--------+------
  61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright |  67
  6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe   | Swift  |  45
  bc01fe61-e095-484e-a512-62fb1f4d37fe | Randy | Byrd   |  15
  f22fda33-4933-4f3f-83fd-27d9c71b7d8f | Beth  | Russel |   9
(4 rows)

Demonstrate INSERT replication

In the PostgreSQL psql client, enter an INSERT:

demodb=# insert into t (fname, lname, age) values ('Bob', 'Barker', 99);
INSERT 0 1
demodb=# select * from t;
                  pk                  | fname | lname  | age 
--------------------------------------+-------+--------+-----
 61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright |  67
 f22fda33-4933-4f3f-83fd-27d9c71b7d8f | Beth  | Russel |   9
 bc01fe61-e095-484e-a512-62fb1f4d37fe | Randy | Byrd   |  15
 6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe   | Swift  |  45
 76f47970-2a82-4c7d-afd0-500c3eed0c73 | Bob   | Barker |  99
(5 rows)

Now in the CockroachDB cockroach sql client, observe that the change has replicated:

root@localhost:26257/demodb> select * from t;
                   pk                  | fname | lname  | age
---------------------------------------+-------+--------+------
  61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright |  67
  6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe   | Swift  |  45
  76f47970-2a82-4c7d-afd0-500c3eed0c73 | Bob   | Barker |  99
  bc01fe61-e095-484e-a512-62fb1f4d37fe | Randy | Byrd   |  15
  f22fda33-4933-4f3f-83fd-27d9c71b7d8f | Beth  | Russel |   9
(5 rows)

Demonstrate DELETE replication

In the PostgreSQL psql client, enter a DELETE:

demodb=# delete from t where age<20;
DELETE 2
demodb=# select * from t;
                  pk                  | fname | lname  | age 
--------------------------------------+-------+--------+-----
 61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright |  67
 6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe   | Swift  |  45
 76f47970-2a82-4c7d-afd0-500c3eed0c73 | Bob   | Barker |  99
(3 rows)

Now in the CockroachDB cockroach sql client, observe that the change has replicated:

root@localhost:26257/demodb> select * from t;
                   pk                  | fname | lname  | age
---------------------------------------+-------+--------+------
  61d59a18-512c-4f65-a2ad-faceb595cae6 | Carol | Wright |  67
  6cad8bc7-c7f2-4401-872a-ba54dcb3dc2f | Joe   | Swift  |  45
  76f47970-2a82-4c7d-afd0-500c3eed0c73 | Bob   | Barker |  99
(3 rows)

Congratulations! This shows that logical replication is working!

Cleanup

Now let’s cleanly shut everything down.

Close the CockroachDB cockroach sql client:

\q

Exit CDC Sink by entering CTRL-C in the shell window that is running CDC Sink:

^C

Clean up the replication setup in PostgreSQL, then exit the psql client. In the psql client, run the following:

SELECT pg_drop_replication_slot('cdc_sink');
DROP PUBLICATION my_pub;
select * from pg_catalog.pg_publication;
\q

Now stop the PostgreSQL server. In a shell window, run:

ps aux | grep post
pg_ctl stop -D demodatadir
ps aux | grep post

Lastly, stop the CockroachDB server. In a shell window, run the following command to determine the process ID (PID) of the CockroachDB server:

ps aux | grep -i cockroach

Next, use this process ID in the kill command to terminate the CockroachDB server:

# substitute PID below
kill 99999
ps aux | grep -i cockroach

Closing

CDC Sink is a powerful tool that can be part of a strategy for low-downtime data migration into CockroachDB. We used CDC Sink to demonstrate how to replicate data from PostgreSQL to CockroachDB.

Data Sources

Use Cases

PostgreSQL as source database – steps

Prerequisites – Installation

Follow along

Start PostgreSQL

Create some data

Configure PostgreSQL write-ahead log level to logical

Create PostgreSQL publication and logical replication slot

Create PostgreSQL transaction snapshot

Create PostgreSQL data dump

Start CockroachDB

Create CockroachDB destination database

Import bulk data from PostgreSQL to CockroachDB

Prepare CockroachDB to use CDC Sink

Start CDC Sink to begin replication

Show logical replication in action

Demonstrate UPDATE replication

Demonstrate INSERT replication

Demonstrate DELETE replication

Cleanup

Closing

Leave a comment Cancel reply