What is KStream in Kafka: Definition and Usage
KStream in Kafka is a core abstraction in Kafka Streams API that represents a continuous stream of records, where each record is a key-value pair. It allows developers to process and transform real-time data streams in a scalable and fault-tolerant way.How It Works
Think of KStream as a never-ending conveyor belt carrying data records one by one. Each record has a key and a value, like a label and its content. As new data arrives, KStream lets you apply operations such as filtering, mapping, or joining, similar to sorting or modifying items on the conveyor belt.
Under the hood, Kafka Streams manages the flow of data from Kafka topics, processes it in real-time, and ensures that if something goes wrong, it can recover without losing data. This makes KStream a powerful tool for building applications that react instantly to new information.
Example
This example shows how to create a KStream from a Kafka topic, convert all values to uppercase, and write the results to another topic.
import org.apache.kafka.common.serialization.Serdes; import org.apache.kafka.streams.KafkaStreams; import org.apache.kafka.streams.StreamsBuilder; import org.apache.kafka.streams.StreamsConfig; import org.apache.kafka.streams.kstream.KStream; import java.util.Properties; public class UppercaseStream { public static void main(String[] args) { Properties props = new Properties(); props.put(StreamsConfig.APPLICATION_ID_CONFIG, "uppercase-app"); props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName()); props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName()); StreamsBuilder builder = new StreamsBuilder(); KStream<String, String> input = builder.stream("input-topic"); KStream<String, String> uppercased = input.mapValues(value -> value.toUpperCase()); uppercased.to("output-topic"); KafkaStreams streams = new KafkaStreams(builder.build(), props); streams.start(); Runtime.getRuntime().addShutdownHook(new Thread(streams::close)); } }
When to Use
Use KStream when you need to process data as it flows in real-time, such as monitoring sensor data, tracking user activity on websites, or transforming logs for analysis. It is ideal for applications that require continuous data processing with low latency.
For example, an online store can use KStream to update inventory counts instantly as orders come in, or a financial app can detect fraud by analyzing transaction streams live.
Key Points
- KStream represents a stream of key-value records from Kafka topics.
- It supports real-time processing with operations like map, filter, and join.
- Kafka Streams API handles fault tolerance and scalability automatically.
- It is used for building event-driven, streaming applications.
Key Takeaways
KStream is a continuous stream of records for real-time processing in Kafka Streams.KStream for applications needing instant reaction to data changes.