0
0
Kafkadevops~10 mins

Join operations (KStream-KStream, KStream-KTable) in Kafka - Commands & Configuration

Choose your learning style9 modes available
Introduction
When you have two streams of data flowing in Kafka and want to combine related information from both, join operations help you do that. They let you merge data from two streams or a stream and a table based on matching keys, so you get a richer, combined view.
When you want to combine user click events with user profile updates in real-time.
When you need to enrich order events with product details stored in a table.
When you want to correlate sensor readings from two different devices streaming data.
When you want to join a stream of transactions with a table of account balances to check limits.
When you want to merge two streams of logs from different services based on timestamps.
Config File - stream-join.properties
stream-join.properties
application.id=stream-join-app
bootstrap.servers=localhost:9092
processing.guarantee=exactly_once
cache.max.bytes.buffering=0

# Serdes for keys and values
key.serde=org.apache.kafka.common.serialization.Serdes$StringSerde
value.serde=org.apache.kafka.common.serialization.Serdes$StringSerde

# Commit interval
commit.interval.ms=1000

This configuration file sets up the Kafka Streams application with a unique application ID and connects it to the Kafka server at localhost:9092. It ensures exactly-once processing to avoid duplicate results. The key and value serializers/deserializers are set to handle strings. The commit interval controls how often the app saves its progress.

Commands
Create the 'user-clicks' topic where click event data will be streamed.
Terminal
kafka-topics --create --topic user-clicks --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
Expected OutputExpected
Created topic user-clicks.
--partitions - Number of partitions for parallelism
--replication-factor - Number of copies for fault tolerance
Create the 'user-profiles' topic which will be used as a table for user profile data.
Terminal
kafka-topics --create --topic user-profiles --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
Expected OutputExpected
Created topic user-profiles.
--partitions - Number of partitions for parallelism
--replication-factor - Number of copies for fault tolerance
Start a producer to send sample click events to the 'user-clicks' topic.
Terminal
kafka-console-producer --topic user-clicks --bootstrap-server localhost:9092
Expected OutputExpected
No output (command runs silently)
--topic - Topic to send messages to
--bootstrap-server - Kafka server address
Start a producer to send user profile updates to the 'user-profiles' topic, which will be used as a table.
Terminal
kafka-console-producer --topic user-profiles --bootstrap-server localhost:9092
Expected OutputExpected
No output (command runs silently)
--topic - Topic to send messages to
--bootstrap-server - Kafka server address
Run the Kafka Streams application that performs KStream-KStream and KStream-KTable joins using the configured properties.
Terminal
java -jar stream-join-app.jar
Expected OutputExpected
INFO Kafka Streams started INFO Performing KStream-KStream join on user-clicks and another stream INFO Performing KStream-KTable join on user-clicks and user-profiles table INFO Streams running...
Consume the output topic 'enriched-clicks' to see the results of the join operations.
Terminal
kafka-console-consumer --topic enriched-clicks --bootstrap-server localhost:9092 --from-beginning
Expected OutputExpected
{"userId":"user1","click":"home","profile":"premium"} {"userId":"user2","click":"search","profile":"basic"}
--from-beginning - Read all messages from the start
Key Concept

If you remember nothing else from this pattern, remember: joining streams lets you combine live data flows to enrich or correlate information in real time based on matching keys.

Common Mistakes
Using different key serializers for the two streams or tables being joined
The join will fail because Kafka Streams cannot match keys if they are serialized differently.
Ensure both streams and tables use the same key serde configuration.
Trying to join streams without setting appropriate windowing for KStream-KStream joins
Without windowing, the join will not find matching records because streams are infinite and time-based.
Use a time window (e.g., 5 minutes) to define the join period for KStream-KStream joins.
Not creating the topics before running the streams application
The application will fail to start or produce errors because the input or output topics do not exist.
Create all required topics with correct partitions and replication before starting the app.
Summary
Create Kafka topics for streams and tables before starting the application.
Configure the Kafka Streams app with matching key and value serializers.
Run the app to perform KStream-KStream and KStream-KTable joins to combine data.
Consume the output topic to verify the joined enriched data.