Overview - Message key and value

What is it?

In Kafka, every message consists of two main parts: a key and a value. The key is an identifier that Kafka uses to decide which partition the message goes to, while the value is the actual data or content of the message. Both key and value are byte arrays, but they often represent strings or structured data. This separation helps Kafka organize and process messages efficiently.

Why it matters

Message keys allow Kafka to group related messages together in the same partition, preserving their order. Without keys, messages might be scattered randomly, making it hard to process related data in sequence. This is crucial for applications like financial transactions or user activity logs where order and grouping matter. Without keys, systems would struggle to maintain consistency and reliability.

Where it fits

Before learning about message keys and values, you should understand Kafka basics like topics and partitions. After this, you can explore Kafka consumer groups and how they read messages. Later, you can learn about Kafka's exactly-once delivery and stateful stream processing, which rely heavily on keys.

Mental Model

Core Idea

The message key in Kafka directs where the message goes, while the value carries the actual information to be processed.

Think of it like...

Imagine a post office where the key is the address on the envelope guiding the letter to the right mailbox, and the value is the letter inside with the message you want to send.

Kafka Topic
┌───────────────┐
│ Partition 0   │◄── Messages with key hashing to 0
│ Partition 1   │◄── Messages with key hashing to 1
│ Partition 2   │◄── Messages with key hashing to 2
└───────────────┘

Message Structure:
┌───────────────┐
│ Key           │───> Determines partition
│ Value         │───> Actual data
└───────────────┘

Build-Up - 7 Steps

1

FoundationUnderstanding Kafka Messages

Concept: Kafka messages have two parts: key and value.

Every message sent to Kafka consists of a key and a value. The key is optional but helps Kafka decide which partition to store the message in. The value is the main content you want to send, like a log entry or user data.

Result

You know that Kafka messages are not just raw data but structured with a key and value.

Understanding that messages have keys and values is the foundation for how Kafka organizes and processes data.

2

FoundationRole of Partitions in Kafka

3

IntermediateHow Kafka Uses Message Keys

4

IntermediateMessage Value: The Payload

5

IntermediateKey Serialization and Data Types

6

AdvancedImpact of Keys on Consumer Processing

7

ExpertKey Design and Partitioning Strategies

Under the Hood

Kafka stores messages in partitions as ordered logs. When a producer sends a message, Kafka uses the key to compute a hash. This hash determines the partition number by modulo operation with the total partitions. The message is appended to that partition's log. Consumers read partitions sequentially, ensuring order for messages with the same key. Serialization converts keys and values to bytes for storage and transmission.

Why designed this way?

Kafka was designed for high throughput and scalability. Using keys to assign partitions allows parallel processing while preserving order for related messages. Storing messages as logs per partition simplifies replication and fault tolerance. Serialization allows Kafka to be data-agnostic, supporting many formats and languages.

Producer
  │
  ▼
[Key]───hash───┐
                │ modulo partitions
                ▼
           ┌─────────────┐
           │ Partition 0 │
           ├─────────────┤
           │ Partition 1 │
           ├─────────────┤
           │ Partition 2 │
           └─────────────┘

Consumer reads partitions in order

Message structure:
┌───────────┬───────────┐
│ Key bytes │ Value bytes│
└───────────┴───────────┘

Myth Busters - 4 Common Misconceptions

Quick: Do you think Kafka guarantees global ordering of all messages across partitions? Commit yes or no.

Common Belief:Kafka guarantees that all messages in a topic are globally ordered regardless of keys or partitions.

Tap to reveal reality

Quick: Do you think a message key must always be unique? Commit yes or no.

Common Belief:Each message key in Kafka must be unique to identify messages distinctly.

Tap to reveal reality

Quick: Do you think Kafka requires keys for all messages? Commit yes or no.

Common Belief:Kafka requires every message to have a key to function properly.

Tap to reveal reality

Quick: Do you think the message value affects partition selection? Commit yes or no.

Common Belief:The message value influences which partition Kafka assigns the message to.

Tap to reveal reality

Expert Zone

1

Kafka's default partitioner uses a hash of the key, but custom partitioners can override this to implement complex routing logic.

2

Null keys cause Kafka to use round-robin partitioning, which can lead to load balancing but loses ordering guarantees for related messages.

3

Serialization format mismatches between producers and consumers can cause subtle bugs, especially when keys and values use different serializers.

When NOT to use

Using keys is not ideal when message order does not matter and even load distribution is more important; in such cases, sending messages without keys or using random partitioners is better. For very high throughput with no ordering needs, keyless messages improve parallelism.

Production Patterns

In production, keys often represent business identifiers like user IDs or transaction IDs to ensure related events are processed in order. Composite keys combining multiple fields are used to balance load and ordering. Custom partitioners help route messages based on complex business rules.

Connections

Hash Functions

Kafka uses hashing of keys to assign partitions.

Understanding hash functions helps grasp how Kafka distributes messages evenly and deterministically.

Database Sharding

Kafka partitions are like shards in databases, splitting data for scalability.

Knowing database sharding concepts clarifies why Kafka uses keys to group related data for efficient processing.

Postal System Sorting

Kafka's key-based partitioning is similar to postal sorting by address.

This connection shows how routing based on keys ensures messages reach the right place in order.

Common Pitfalls

#1Sending all messages with the same key causing one partition overload.

Wrong approach:producer.send(new ProducerRecord("topic", "sameKey", "value1")); producer.send(new ProducerRecord("topic", "sameKey", "value2")); // All messages go to one partition

Correct approach:Use keys that distribute load better, e.g., include user ID or hash part of the key: producer.send(new ProducerRecord("topic", "user1", "value1")); producer.send(new ProducerRecord("topic", "user2", "value2"));

Root cause:Misunderstanding that keys control partitioning and that using the same key always sends messages to the same partition.

#2Assuming messages without keys maintain order for related data.

Wrong approach:producer.send(new ProducerRecord("topic", null, "value1")); producer.send(new ProducerRecord("topic", null, "value2")); // Messages distributed round-robin, order not guaranteed

Correct approach:Assign keys to related messages to preserve order: producer.send(new ProducerRecord("topic", "order123", "value1")); producer.send(new ProducerRecord("topic", "order123", "value2"));

Root cause:Not knowing that null keys cause Kafka to distribute messages without ordering guarantees.

#3Using different serializers for key and value without matching consumer deserializers.

Wrong approach:Producer uses StringSerializer for key and AvroSerializer for value, but consumer uses StringDeserializer for both.

Correct approach:Ensure consumer uses matching deserializers: Consumer uses StringDeserializer for key and AvroDeserializer for value.

Root cause:Ignoring that serialization formats must match between producers and consumers to avoid data corruption.

Key Takeaways

Kafka messages have a key and a value; the key directs message placement, and the value carries the data.

Keys determine which partition a message goes to, preserving order for messages with the same key.

Keys are optional; without them, Kafka distributes messages round-robin, losing ordering guarantees.

Serialization converts keys and values to bytes, allowing Kafka to handle any data format.

Choosing keys wisely affects load balancing, ordering, and overall system performance.