Kafkadevops~5 mins

State stores in Kafka - Commands & Configuration

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

State stores keep track of data changes over time in Kafka Streams applications. They help remember information between processing steps, like a notebook that saves your progress.

When you want to count how many times a word appears in a stream of messages.

When you need to join data from two streams and keep track of matching records.

When you want to maintain a running total or aggregate of events over time.

When you need to recover your application's state after a restart without losing data.

When you want to query the current state of your streaming data in real time.

Config File - state-store.properties

state-store.properties

application.id=my-streams-app
bootstrap.servers=localhost:9092
state.dir=/tmp/kafka-streams
cache.max.bytes.buffering=10485760
commit.interval.ms=1000

application.id: Unique ID for your Kafka Streams app to isolate its state.
bootstrap.servers: Kafka server addresses to connect.
state.dir: Local folder where state stores are saved.
cache.max.bytes.buffering: Memory cache size before flushing to state store.
commit.interval.ms: How often to save state changes to Kafka.

Commands

Create an input topic where messages will be sent for processing.

Terminal

kafka-topics --create --topic word-count-input --bootstrap-server localhost:9092 --partitions 1 --replication-factor 1

Expected OutputExpected

Created topic word-count-input.

→

--topic - Name of the Kafka topic to create

→

--partitions - Number of partitions for the topic

→

--replication-factor - Number of copies of the topic data

Send messages to the input topic to test the state store counting.

Terminal

kafka-console-producer --topic word-count-input --bootstrap-server localhost:9092

Expected OutputExpected

No output (command runs silently)

→

--topic - Topic to send messages to

→

--bootstrap-server - Kafka server address

Start the Kafka Streams application that uses a state store to count words.

Terminal

kafka-streams --config state-store.properties --application-class com.example.WordCountApp

Expected OutputExpected

INFO Kafka Streams started INFO State store initialized at /tmp/kafka-streams INFO Processing records...

→

--config - Configuration file for the streams app

→

--application-class - Main class of the Kafka Streams application

Query the current counts stored in the state store to see results.

Terminal

kafka-streams-application --query-state-store word-count-store --application-id my-streams-app --bootstrap-server localhost:9092

Expected OutputExpected

word1: 5 word2: 3 word3: 7

→

--query-state-store - Name of the state store to query

→

--application-id - Kafka Streams application ID

→

--bootstrap-server - Kafka server address

Key Concept

If you remember nothing else from this pattern, remember: state stores let your streaming app remember and query data between processing steps.

Common Mistakes

Not setting a unique application.id in the config

Kafka Streams uses application.id to isolate state stores; without it, state can mix or fail.

Always set a unique application.id for each Kafka Streams application.

Not specifying state.dir or using a non-writable directory

State stores need a local folder to save data; if missing or unwritable, the app fails to store state.

Set state.dir to a valid, writable local path.

Querying the state store before the application has processed any data

State store will be empty or not initialized, so queries return no data.

Send some input data and wait for processing before querying the state store.

Summary

Create Kafka topics to send and receive streaming data.

Configure Kafka Streams with a unique application ID and local state directory.

Run the Kafka Streams app to process data and store intermediate results in state stores.

Query the state store to get the current stored data like counts or aggregates.