What is Kafka Streams: Simple Explanation and Example
Kafka Streams is a Java library for building applications that process and analyze data in real-time from Apache Kafka topics. It lets you transform, filter, and aggregate data streams easily within your application without needing separate processing clusters.How It Works
Imagine you have a river of data flowing continuously, like water in a stream. Kafka Streams acts like a watermill on that river, processing the data as it flows by. It reads data from Kafka topics, processes it in real-time, and writes the results back to other topics.
It works inside your application as a library, so you don't need to manage separate servers or clusters. It keeps track of where it left off, so if your app stops and restarts, it continues processing without losing data. This makes it reliable and easy to scale.
Example
This example shows a simple Kafka Streams application that reads text lines from one topic, converts them to uppercase, and writes them to another topic.
import org.apache.kafka.common.serialization.Serdes; import org.apache.kafka.streams.KafkaStreams; import org.apache.kafka.streams.StreamsBuilder; import org.apache.kafka.streams.StreamsConfig; import org.apache.kafka.streams.kstream.KStream; import java.util.Properties; public class UppercaseStream { public static void main(String[] args) { Properties props = new Properties(); props.put(StreamsConfig.APPLICATION_ID_CONFIG, "uppercase-app"); props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092"); props.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName()); props.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName()); StreamsBuilder builder = new StreamsBuilder(); KStream<String, String> input = builder.stream("input-topic"); KStream<String, String> uppercased = input.mapValues(value -> value.toUpperCase()); uppercased.to("output-topic"); KafkaStreams streams = new KafkaStreams(builder.build(), props); streams.start(); Runtime.getRuntime().addShutdownHook(new Thread(streams::close)); } }
When to Use
Use Kafka Streams when you want to process data in real-time as it flows through Kafka. It is great for tasks like filtering events, transforming data formats, counting occurrences, or joining streams.
Real-world examples include monitoring user activity on websites, detecting fraud in financial transactions, or aggregating sensor data from IoT devices. It is ideal when you want to embed stream processing directly into your Java applications without managing extra infrastructure.
Key Points
- Kafka Streams is a lightweight Java library for stream processing.
- It processes data directly from Kafka topics in real-time.
- No need for separate processing clusters or servers.
- Supports transformations, filtering, aggregations, and joins.
- Automatically handles fault tolerance and state management.