Kafkadevops~5 mins

Stream topology in Kafka - Commands & Configuration

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

When you want to process data streams in real time, you need to define how data flows and transforms between different steps. A stream topology is a map of these steps and connections, helping you organize and run your data processing smoothly.

When you want to filter and transform live data from sensors before saving it.

When you need to join two streams of data, like user clicks and purchases, to analyze behavior.

When you want to aggregate data over time, such as counting events per minute.

When you want to build a pipeline that reads from one topic, processes data, and writes to another topic.

When you want to visualize or debug how your streaming application processes data step-by-step.

Config File - stream_topology.properties

stream_topology.properties

application.id=stream-topology-example
bootstrap.servers=localhost:9092
processing.guarantee=exactly_once
cache.max.bytes.buffering=10485760
commit.interval.ms=1000

This file sets up the Kafka Streams application properties:

application.id: Unique name for your stream app.
bootstrap.servers: Kafka server address.
processing.guarantee: Ensures data is processed exactly once.
cache.max.bytes.buffering: Controls memory buffering for performance.
commit.interval.ms: How often to save progress.

Commands

Create an input topic with 3 partitions to receive streaming data.

Terminal

kafka-topics --create --topic input-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1

Expected OutputExpected

Created topic input-topic.

→

--partitions - Number of partitions for parallelism

→

--replication-factor - Number of copies for fault tolerance

Create an output topic where processed data will be sent.

Terminal

kafka-topics --create --topic output-topic --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1

Expected OutputExpected

Created topic output-topic.

→

--partitions - Number of partitions for parallelism

→

--replication-factor - Number of copies for fault tolerance

Start a producer to send sample data to the input topic for processing.

Terminal

kafka-console-producer --topic input-topic --bootstrap-server localhost:9092

Expected OutputExpected

No output (command runs silently)

→

--topic - Topic to send data to

→

--bootstrap-server - Kafka server address

Consume and display processed data from the output topic to verify the stream topology works.

Terminal

kafka-console-consumer --topic output-topic --bootstrap-server localhost:9092 --from-beginning

Expected OutputExpected

processed-data-example

→

--from-beginning - Read all messages from the start

Key Concept

If you remember nothing else from this pattern, remember: a stream topology defines how data flows and transforms step-by-step in a Kafka Streams application.

Common Mistakes

Not creating the input and output topics before running the stream application.

The stream app will fail because it cannot read or write data without these topics.

Always create all required topics with correct partitions and replication before starting the stream.

Using the same topic for input and output in the topology.

This can cause infinite loops or data corruption in the stream processing.

Use separate topics for input and output to keep data flow clear and safe.

Not setting the application.id property uniquely for each stream app instance.

Kafka Streams uses this ID to track state; duplicates cause conflicts and errors.

Set a unique application.id in the properties file for each distinct stream application.

Summary

Create input and output Kafka topics to hold streaming data.

Configure the stream application with properties like application.id and bootstrap servers.

Send data to the input topic and consume processed data from the output topic to verify the flow.