0
0
Kafkadevops~5 mins

Stream topology in Kafka - Commands & Configuration

Choose your learning style9 modes available
Introduction
When you want to process data streams in real time, you need to define how data flows and transforms between different steps. A stream topology is a map of these steps and connections, helping you organize and run your data processing smoothly.
When you want to filter and transform live data from sensors before saving it.
When you need to join two streams of data, like user clicks and purchases, to analyze behavior.
When you want to aggregate data over time, such as counting events per minute.
When you want to build a pipeline that reads from one topic, processes data, and writes to another topic.
When you want to visualize or debug how your streaming application processes data step-by-step.
Config File - stream_topology.properties
stream_topology.properties
application.id=stream-topology-example
bootstrap.servers=localhost:9092
processing.guarantee=exactly_once
cache.max.bytes.buffering=10485760
commit.interval.ms=1000

This file sets up the Kafka Streams application properties:

  • application.id: Unique name for your stream app.
  • bootstrap.servers: Kafka server address.
  • processing.guarantee: Ensures data is processed exactly once.
  • cache.max.bytes.buffering: Controls memory buffering for performance.
  • commit.interval.ms: How often to save progress.
Commands
Create an input topic with 3 partitions to receive streaming data.
Terminal
kafka-topics --create --topic input-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
Expected OutputExpected
Created topic input-topic.
--partitions - Number of partitions for parallelism
--replication-factor - Number of copies for fault tolerance
Create an output topic where processed data will be sent.
Terminal
kafka-topics --create --topic output-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
Expected OutputExpected
Created topic output-topic.
--partitions - Number of partitions for parallelism
--replication-factor - Number of copies for fault tolerance
Start a producer to send sample data to the input topic for processing.
Terminal
kafka-console-producer --topic input-topic --bootstrap-server localhost:9092
Expected OutputExpected
No output (command runs silently)
--topic - Topic to send data to
--bootstrap-server - Kafka server address
Consume and display processed data from the output topic to verify the stream topology works.
Terminal
kafka-console-consumer --topic output-topic --bootstrap-server localhost:9092 --from-beginning
Expected OutputExpected
processed-data-example
--from-beginning - Read all messages from the start
Key Concept

If you remember nothing else from this pattern, remember: a stream topology defines how data flows and transforms step-by-step in a Kafka Streams application.

Common Mistakes
Not creating the input and output topics before running the stream application.
The stream app will fail because it cannot read or write data without these topics.
Always create all required topics with correct partitions and replication before starting the stream.
Using the same topic for input and output in the topology.
This can cause infinite loops or data corruption in the stream processing.
Use separate topics for input and output to keep data flow clear and safe.
Not setting the application.id property uniquely for each stream app instance.
Kafka Streams uses this ID to track state; duplicates cause conflicts and errors.
Set a unique application.id in the properties file for each distinct stream application.
Summary
Create input and output Kafka topics to hold streaming data.
Configure the stream application with properties like application.id and bootstrap servers.
Send data to the input topic and consume processed data from the output topic to verify the flow.