Kafkadevops~30 mins

Common connectors (JDBC, S3, Elasticsearch) in Kafka - Mini Project: Build & Apply

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Using Kafka Connect with Common Connectors (JDBC, S3, Elasticsearch)

📖 Scenario: You work at a company that collects data from various sources and wants to move it efficiently using Kafka. You will learn how to set up Kafka Connect with common connectors to move data from a database, store files in S3, and index data in Elasticsearch.

🎯 Goal: Build a simple Kafka Connect setup that uses the JDBC source connector to read data from a database, the S3 sink connector to save data files, and the Elasticsearch sink connector to index data for search.

📋 What You'll Learn

Create a Kafka Connect configuration for the JDBC source connector

Add a configuration variable for the S3 bucket name

Write the core logic to configure the Elasticsearch sink connector

Print the final combined Kafka Connect configuration

💡 Why This Matters

🌍 Real World

Companies use Kafka Connect to move data between systems like databases, cloud storage, and search engines automatically.

💼 Career

Knowing how to configure Kafka Connect with common connectors is useful for data engineers and backend developers working with streaming data pipelines.

Progress0 / 4 steps

Create JDBC Source Connector Configuration

Create a dictionary called jdbc_source_config with these exact entries: "name": "jdbc_source_connector", "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector", "connection.url": "jdbc:postgresql://localhost:5432/mydb", "table.whitelist": "users", "mode": "incrementing", and "incrementing.column.name": "id".

Kafka

# Your code here

Need a hint?

Use a Python dictionary with the exact keys and values given.

Add S3 Bucket Configuration Variable

Create a variable called s3_bucket_name and set it to the string "my-kafka-data-bucket".

Kafka

jdbc_source_config = {
    "name": "jdbc_source_connector",
    "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
    "connection.url": "jdbc:postgresql://localhost:5432/mydb",
    "table.whitelist": "users",
    "mode": "incrementing",
    "incrementing.column.name": "id"
}
# Your code here

Need a hint?

Assign the exact string to the variable s3_bucket_name.

Configure Elasticsearch Sink Connector

Create a dictionary called elasticsearch_sink_config with these exact entries: "name": "elasticsearch_sink_connector", "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector", "connection.url": "http://localhost:9200", and "topics": "users".

Kafka

jdbc_source_config = {
    "name": "jdbc_source_connector",
    "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
    "connection.url": "jdbc:postgresql://localhost:5432/mydb",
    "table.whitelist": "users",
    "mode": "incrementing",
    "incrementing.column.name": "id"
}
s3_bucket_name = "my-kafka-data-bucket"
# Your code here

Need a hint?

Use a dictionary with the exact keys and values given for Elasticsearch sink.

Print Final Kafka Connect Configurations

Write a print statement to display the combined dictionary called all_connectors that contains the keys "jdbc_source", "s3_bucket", and "elasticsearch_sink" with values jdbc_source_config, s3_bucket_name, and elasticsearch_sink_config respectively.

Kafka

jdbc_source_config = {
    "name": "jdbc_source_connector",
    "connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
    "connection.url": "jdbc:postgresql://localhost:5432/mydb",
    "table.whitelist": "users",
    "mode": "incrementing",
    "incrementing.column.name": "id"
}
s3_bucket_name = "my-kafka-data-bucket"
elasticsearch_sink_config = {
    "name": "elasticsearch_sink_connector",
    "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
    "connection.url": "http://localhost:9200",
    "topics": "users"
}
# Your code here

Need a hint?

Create a dictionary with the exact keys and values, then print it.