0
0
Kafkadevops~5 mins

Kafka Connect architecture - Commands & Configuration

Choose your learning style9 modes available
Introduction
Kafka Connect helps move data between Kafka and other systems automatically. It solves the problem of writing custom code to connect databases, files, or other services to Kafka.
When you want to copy data from a database into Kafka without writing code
When you need to export Kafka data to a search engine like Elasticsearch
When you want to stream data from files or logs into Kafka topics
When you want to keep data synchronized between Kafka and external systems
When you want to run connectors in a scalable and fault-tolerant way
Config File - connect-distributed.properties
connect-distributed.properties
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
config.storage.topic=connect-configs
offset.storage.topic=connect-offsets
status.storage.topic=connect-status
config.storage.replication.factor=1
offset.storage.replication.factor=1
status.storage.replication.factor=1
rest.port=8083

This file configures Kafka Connect in distributed mode.

  • bootstrap.servers: Kafka brokers to connect to.
  • key.converter and value.converter: How data is serialized.
  • config.storage.topic, offset.storage.topic, status.storage.topic: Internal topics to store connector configs, offsets, and status.
  • replication.factor: Number of copies for fault tolerance.
  • rest.port: Port for REST API to manage connectors.
Commands
Starts Kafka Connect in distributed mode using the configuration file. This runs the service that manages connectors and tasks.
Terminal
connect-distributed.sh connect-distributed.properties
Expected OutputExpected
[2024-06-01 12:00:00,000] INFO Kafka Connect distributed worker started (org.apache.kafka.connect.runtime.ConnectDistributed)
Creates a new source connector that reads from a file and writes to a Kafka topic. This shows how to add connectors via REST API.
Terminal
curl -X POST -H "Content-Type: application/json" --data '{"name": "my-source-connector", "config": {"connector.class": "FileStreamSource", "tasks.max": "1", "file": "/tmp/input.txt", "topic": "my-topic"}}' http://localhost:8083/connectors
Expected OutputExpected
{"name":"my-source-connector","config":{"connector.class":"FileStreamSource","tasks.max":"1","file":"/tmp/input.txt","topic":"my-topic"},"tasks":[],"type":"source"}
Lists all connectors currently running in Kafka Connect. This verifies the connector was created successfully.
Terminal
curl http://localhost:8083/connectors
Expected OutputExpected
["my-source-connector"]
Shows the status of the specific connector and its tasks to check if it is running properly.
Terminal
curl http://localhost:8083/connectors/my-source-connector/status
Expected OutputExpected
{"name":"my-source-connector","connector":{"state":"RUNNING","worker_id":"connect-worker-1"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"connect-worker-1"}]}
Key Concept

If you remember nothing else from this pattern, remember: Kafka Connect runs connectors as separate tasks managed by a distributed service that handles data movement automatically.

Common Mistakes
Not setting the internal storage topics in the config file
Kafka Connect needs these topics to store connector configs, offsets, and status. Without them, it cannot track progress or recover.
Always define config.storage.topic, offset.storage.topic, and status.storage.topic with proper replication factors.
Starting Kafka Connect without Kafka brokers running
Kafka Connect depends on Kafka brokers to send and receive data. Without brokers, it will fail to start or connect.
Make sure Kafka brokers are running and reachable at the bootstrap.servers address before starting Kafka Connect.
Using incompatible converters for key and value
If converters do not match the data format, connectors will fail to serialize or deserialize data correctly.
Use matching converters like JsonConverter for both key and value or configure them according to your data format.
Summary
Kafka Connect runs as a distributed service managing connectors and tasks to move data automatically.
You configure Kafka Connect with a properties file specifying Kafka brokers, converters, and internal topics.
Connectors are created and managed via REST API calls to add, list, and check status.