0
0
Kafkadevops~7 mins

KStream and KTable concepts in Kafka - Commands & Configuration

Choose your learning style9 modes available
Introduction
When working with data streams, you often need to process continuous flows of data or keep track of the latest state. KStream and KTable help you handle these two different needs in Kafka by letting you work with streams of events or tables of current data.
When you want to process each event in a stream as it arrives, like tracking user clicks in real time.
When you need to keep the latest state of data, such as the current balance of a bank account.
When you want to join a stream of events with a table of reference data, like enriching orders with product details.
When you want to aggregate data over time, like counting the number of sales per product.
When you want to update or delete records based on new incoming data.
Commands
This command creates a Kafka topic named 'user-clicks' with 3 partitions to hold streaming event data.
Terminal
kafka-topics --create --topic user-clicks --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
Expected OutputExpected
Created topic user-clicks.
--topic - Specifies the name of the Kafka topic to create
--partitions - Sets the number of partitions for parallel processing
--replication-factor - Sets how many copies of the data are kept for fault tolerance
This command starts a producer to send events to the 'user-clicks' topic, simulating a stream of user click events.
Terminal
kafka-console-producer --topic user-clicks --bootstrap-server localhost:9092
Expected OutputExpected
No output (command runs silently)
--topic - Specifies the topic to send messages to
--bootstrap-server - Specifies the Kafka server address
This command reads all events from the 'user-clicks' topic from the start, showing the stream of events as they arrive.
Terminal
kafka-console-consumer --topic user-clicks --bootstrap-server localhost:9092 --from-beginning
Expected OutputExpected
user1,click,home user2,click,product user1,click,cart
--from-beginning - Reads all messages from the start of the topic
This command runs a Kafka Streams application that reads the 'user-clicks' stream, processes it as a KStream, and outputs results to 'user-clicks-count'.
Terminal
kafka-streams-application --application-id user-clicks-app --bootstrap-server localhost:9092 --input-topic user-clicks --output-topic user-clicks-count
Expected OutputExpected
Kafka Streams application started with application.id=user-clicks-app
--application-id - Unique ID for the Kafka Streams application
--input-topic - Topic to read the input stream from
--output-topic - Topic to write the processed output to
Key Concept

If you remember nothing else from this pattern, remember: KStream processes each event as it arrives, while KTable represents the latest state of data like a table.

Common Mistakes
Using KStream when you need to track the latest state of data
KStream treats data as a continuous event stream and does not keep the latest value, so you lose the current state.
Use KTable to represent and query the latest state of each key.
Trying to join two KStreams without considering event timing
Joining KStreams requires careful handling of event time windows; otherwise, you get incomplete or incorrect results.
Use windowed joins or join a KStream with a KTable for simpler state enrichment.
Not creating the Kafka topic before producing or consuming
Kafka commands fail if the topic does not exist or is misconfigured.
Always create topics with the right partitions and replication before use.
Summary
Create Kafka topics to hold streams or tables of data.
Use KStream to process each event in a data stream as it arrives.
Use KTable to keep and query the latest state of data.
Run Kafka Streams applications to process and transform streams and tables.
Consume and produce data to Kafka topics to see the flow of events.