Step-by-Step Guide to Implement CDC with Kafka and Debezium in Docker

This guide will walk you through implementing a Change Data Capture (CDC) pipeline using Kafka, Debezium, and MySQL in a Docker environment.

Prerequisites

Docker installed on your machine
Docker Compose installed
Basic understanding of Docker concepts
Basic familiarity with Kafka and databases

Step 1: Set Up Project Structure

Create a new directory for your project:

mkdir kafka-cdc-pipeline
cd kafka-cdc-pipeline

Create a conf directory for configuration files:
```
mkdir conf
```

Step 2: Create Docker Compose File

Create a docker-compose.yml file in your project directory with the following content:

services:
  mysql:
    image: debezium/example-mysql:1.8
    container_name: mysql
    hostname: mysql
    ports:
      - '3306:3306'
    environment:
      MYSQL_ROOT_PASSWORD: debezium
      MYSQL_USER: mysqluser
      MYSQL_PASSWORD: mysqlpw
 
  zookeeper:
    image: confluentinc/cp-zookeeper:5.5.3
    container_name: zookeeper
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    ports:
      - '2181:2181'
 
  kafka:
    image: confluentinc/cp-enterprise-kafka:5.5.3
    container_name: kafka
    depends_on:
      - zookeeper
    environment:
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    links:
      - zookeeper
    ports:
      - '9092:9092'
 
  schema-registry:
    container_name: schema-registry
    image: confluentinc/cp-schema-registry:4.0.3
    environment:
      - SCHEMA_REGISTRY_KAFKASTORE_CONNECTION_URL=zookeeper:2181
      - SCHEMA_REGISTRY_HOST_NAME=schema-registry
      - SCHEMA_REGISTRY_LISTENERS=http://schema-registry:8081
    ports:
      - '8081:8081'
 
  kafka-connect:
    container_name: kafka-connect
    image: debezium/connect:1.8
    ports:
      - '8083:8083'
    links:
      - kafka
      - zookeeper
    environment:
      - BOOTSTRAP_SERVERS=kafka:9092
      - GROUP_ID=medium_debezium
      - CONFIG_STORAGE_TOPIC=my_connect_configs
      - CONFIG_STORAGE_REPLICATION_FACTOR=1
      - OFFSET_STORAGE_TOPIC=my_connect_offsets
      - OFFSET_STORAGE_REPLICATION_FACTOR=1
      - STATUS_STORAGE_TOPIC=my_connect_statuses
      - STATUS_STORAGE_REPLICATION_FACTOR=1
      - REST_ADVERTISED_HOST_NAME=medium_debezium
 
  connector-sender:
    container_name: connector-sender
    image: curlimages/curl
    volumes:
      - ./conf:/conf
    depends_on:
      - kafka-connect
    command:
      - sh
      - -c
      - |
        echo "Waiting for kafka connect to start..."
        until curl -s -o /dev/null -w "%{http_code}" http://kafka-connect:8083/connectors | grep -q "200"; do
          sleep 1
        done
        echo -e "Sending connector"
        curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" http://kafka-connect:8083/connectors/ -d @/conf/mysql_config.json

Step 3: Create MySQL Connector Configuration

In the conf directory, create a file named mysql_config.json with the following content:

{
    "name": "medium_debezium",
    "config": {
        "connector.class": "io.debezium.connector.mysql.MySqlConnector",
        "database.hostname": "mysql",
        "database.port": "3306",
        "database.user": "root",
        "database.password": "debezium",
        "database.server.id": "184054",
        "database.server.name": "dbserver1",
        "database.include.list": "inventory",
        "database.history.kafka.bootstrap.servers": "kafka:9092",
        "database.history.kafka.topic": "schema-changes.inventory",
        "table.include.list": "inventory.customers, inventory.orders",
        "column.exclude.list": "inventory.customers.email",
        "snapshot.mode": "initial",
        "snapshot.locking.mode": "none"
    }
}

Step 4: Start the Services

Run the following command to start all services:
```
docker-compose up -d
```
Monitor the logs to ensure all services start correctly:
```
docker-compose logs -f
```

Step 5: Verify the Setup

Check if the connector was created successfully:
```
curl -s http://localhost:8083/connectors | jq
```
You should see ["medium_debezium"] in the output.

Check the status of your connector:

curl -s http://localhost:8083/connectors/medium_debezium/status | jq

Step 6: Explore Kafka Topics

Enter the Kafka container:
```
docker exec -it kafka bash
```
List all Kafka topics:
```
kafka-topics --bootstrap-server=localhost:9092 --list
```
You should see topics like dbserver1.inventory.customers and dbserver1.inventory.orders.

Consume messages from a topic:

kafka-console-consumer --bootstrap-server localhost:9092 --topic dbserver1.inventory.customers --from-beginning

Step 7: Test CDC Functionality

Connect to the MySQL database:

docker exec -it mysql mysql -u root -pdebezium

Use the inventory database:
```
USE inventory;
```

Make some changes to the data:

INSERT INTO customers (first_name, last_name, email) VALUES ('John', 'Doe', 'john.doe@example.com');
UPDATE customers SET first_name = 'Jane' WHERE first_name = 'John';
DELETE FROM customers WHERE first_name = 'Jane';

Observe the changes in the Kafka consumer you started earlier.

Step 8: Clean Up

When you’re done testing, you can stop and remove all containers:

docker-compose down

Additional Notes

Schema Registry: The schema registry is running on port 8081 if you need to access it.
Kafka Connect UI: You can access the Kafka Connect REST API at http://localhost:8083.
Customizing the Setup:
- To monitor the pipeline, consider adding Kafka UI tools like AKHQ or Conduktor.
- For production use, you’ll want to add proper security configurations.
Troubleshooting:
- If services don’t start properly, check logs with docker-compose logs [service_name].
- Ensure ports 3306, 2181, 9092, 8081, and 8083 are available on your host machine.

This setup provides a complete CDC pipeline that captures changes from MySQL and streams them to Kafka topics in real-time. You can now extend this by adding sinks to other systems or building stream processing applications.

🪴 DEEPAK NISHAD

Explorer

MYSQL 8.0 CDC with Kafka