0
0
Kafkadevops~5 mins

State stores in Kafka - Commands & Configuration

Choose your learning style9 modes available
Introduction
State stores keep track of data changes over time in Kafka Streams applications. They help remember information between processing steps, like a notebook that saves your progress.
When you want to count how many times a word appears in a stream of messages.
When you need to join data from two streams and keep track of matching records.
When you want to maintain a running total or aggregate of events over time.
When you need to recover your application's state after a restart without losing data.
When you want to query the current state of your streaming data in real time.
Config File - state-store.properties
state-store.properties
application.id=my-streams-app
bootstrap.servers=localhost:9092
state.dir=/tmp/kafka-streams
cache.max.bytes.buffering=10485760
commit.interval.ms=1000

application.id: Unique ID for your Kafka Streams app to isolate its state.
bootstrap.servers: Kafka server addresses to connect.
state.dir: Local folder where state stores are saved.
cache.max.bytes.buffering: Memory cache size before flushing to state store.
commit.interval.ms: How often to save state changes to Kafka.

Commands
Create an input topic where messages will be sent for processing.
Terminal
kafka-topics --create --topic word-count-input --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1
Expected OutputExpected
Created topic word-count-input.
--topic - Name of the Kafka topic to create
--partitions - Number of partitions for the topic
--replication-factor - Number of copies of the topic data
Send messages to the input topic to test the state store counting.
Terminal
kafka-console-producer --topic word-count-input --bootstrap-server localhost:9092
Expected OutputExpected
No output (command runs silently)
--topic - Topic to send messages to
--bootstrap-server - Kafka server address
Start the Kafka Streams application that uses a state store to count words.
Terminal
kafka-streams --config state-store.properties --application-class com.example.WordCountApp
Expected OutputExpected
INFO Kafka Streams started INFO State store initialized at /tmp/kafka-streams INFO Processing records...
--config - Configuration file for the streams app
--application-class - Main class of the Kafka Streams application
Query the current counts stored in the state store to see results.
Terminal
kafka-streams-application --query-state-store word-count-store --application-id my-streams-app --bootstrap-server localhost:9092
Expected OutputExpected
word1: 5 word2: 3 word3: 7
--query-state-store - Name of the state store to query
--application-id - Kafka Streams application ID
--bootstrap-server - Kafka server address
Key Concept

If you remember nothing else from this pattern, remember: state stores let your streaming app remember and query data between processing steps.

Common Mistakes
Not setting a unique application.id in the config
Kafka Streams uses application.id to isolate state stores; without it, state can mix or fail.
Always set a unique application.id for each Kafka Streams application.
Not specifying state.dir or using a non-writable directory
State stores need a local folder to save data; if missing or unwritable, the app fails to store state.
Set state.dir to a valid, writable local path.
Querying the state store before the application has processed any data
State store will be empty or not initialized, so queries return no data.
Send some input data and wait for processing before querying the state store.
Summary
Create Kafka topics to send and receive streaming data.
Configure Kafka Streams with a unique application ID and local state directory.
Run the Kafka Streams app to process data and store intermediate results in state stores.
Query the state store to get the current stored data like counts or aggregates.