Overview - Partition key and routing

What is it?

Partition key and routing in Kafka determine how messages are distributed across different partitions of a topic. A partition key is a value attached to each message that helps Kafka decide which partition the message should go to. Routing is the process Kafka uses to assign messages to partitions based on the key or other rules. This ensures messages with the same key go to the same partition, preserving order for those messages.

Why it matters

Without partition keys and routing, Kafka would distribute messages randomly, breaking the order of related messages and making it hard to process data consistently. This would cause problems in systems that rely on message order, like financial transactions or user activity tracking. Proper routing improves scalability and fault tolerance by balancing load across partitions while keeping related data together.

Where it fits

Learners should first understand Kafka basics like topics and partitions. After mastering partition keys and routing, they can explore Kafka consumer groups and exactly-once processing. This topic fits in the middle of Kafka learning, bridging message organization and consumption.

Mental Model

Core Idea

Partition keys guide Kafka to send related messages to the same partition, ensuring order and balanced load.

Think of it like...

Imagine a post office sorting letters by zip code. The zip code on each letter is like the partition key, directing the letter to the right sorting bin (partition) so all mail for one area stays together and is processed in order.

Kafka Topic
┌───────────────┐
│   Partition 0 │◄─ Messages with key hash to 0
│               │
├───────────────┤
│   Partition 1 │◄─ Messages with key hash to 1
│               │
├───────────────┤
│   Partition 2 │◄─ Messages with key hash to 2
└───────────────┘

Message flow:
[Message with key K] --hash(K)--> Partition N

All messages with same key K go to Partition N

Build-Up - 7 Steps

1

FoundationWhat is a Kafka partition

Concept: Introduce the idea of partitions as separate logs within a Kafka topic.

A Kafka topic is split into multiple partitions. Each partition is an ordered, immutable sequence of messages. Partitions allow Kafka to scale by spreading data and load across servers. Messages in a partition are stored in the order they arrive.

Result

Learners understand that partitions are the basic units of parallelism and ordering in Kafka.

Knowing partitions exist explains why Kafka can handle large data volumes and why message order is only guaranteed within a partition.

2

FoundationRole of keys in Kafka messages

3

IntermediateHow Kafka routes messages using keys

4

IntermediateImpact of partition count on routing

5

IntermediateCustom partitioners for advanced routing

6

AdvancedEnsuring order and scalability trade-offs

7

ExpertInternal routing optimizations and pitfalls

Under the Hood

Kafka uses a partitioner component in the producer client. When sending a message, the producer checks if a key exists. If yes, it applies a hash function (usually murmur2) to the key bytes, producing a 32-bit integer. It then calculates the partition by taking the hash modulo the number of partitions. This calculation is deterministic, so the same key always maps to the same partition unless the partition count changes. If no key is present, the producer uses a round-robin or sticky partitioner to distribute messages evenly.

Why designed this way?

This design balances two needs: preserving order for related messages and distributing load evenly. Hashing keys is fast and deterministic, avoiding the need for centralized routing decisions. Alternatives like random assignment break order guarantees, while centralized routing would reduce scalability and increase latency. The murmur2 hash was chosen for speed and good distribution properties.

Producer Client
┌───────────────────────┐
│ Message with Key?     │
├───────────────┬───────┤
│ Yes           │ No    │
│               │       │
│ Apply Hash    │ Round-│
│ Function      │ Robin │
│ (murmur2)     │       │
│               │       │
│ Calculate     │       │
│ Partition =   │       │
│ hash % N      │       │
└───────┬───────┴───────┘
        │
        ▼
┌───────────────────────┐
│ Send message to        │
│ selected partition     │
└───────────────────────┘

Myth Busters - 4 Common Misconceptions

Quick: do you think messages with the same key always go to the same partition even if partitions change? Commit to yes or no.

Common Belief:Messages with the same key always go to the same partition regardless of partition count changes.

Tap to reveal reality

Quick: do you think Kafka guarantees global order of all messages in a topic? Commit to yes or no.

Common Belief:Kafka guarantees the order of all messages across the entire topic.

Tap to reveal reality

Quick: do you think using keys always improves load balancing? Commit to yes or no.

Common Belief:Using keys always improves load balancing across partitions.

Tap to reveal reality

Quick: do you think all Kafka clients use the same hash function for keys? Commit to yes or no.

Common Belief:All Kafka clients use the same hash function, so routing is consistent everywhere.

Tap to reveal reality

Expert Zone

1

Some Kafka clients allow configuring the hash function or partitioner, which can affect routing consistency across systems.

2

Key serialization format must be consistent across producers to ensure identical hash results; subtle differences cause routing errors.

3

Sticky partitioners introduced in newer Kafka versions optimize batching but can interact unexpectedly with keyed messages.

When NOT to use

Avoid using keys for routing when message order is not important and you want maximum throughput and even load distribution. In such cases, use keyless messages with round-robin partitioning. Also, avoid custom partitioners if they add complexity without clear benefits; prefer default hashing for simplicity and compatibility.

Production Patterns

In production, teams often use keys based on user IDs or session IDs to ensure all events for a user go to the same partition. Custom partitioners route messages by geographic region to optimize data locality. Partition counts are carefully planned and rarely changed to avoid rebalancing issues. Monitoring hot partitions helps detect load imbalances caused by skewed keys.

Connections

Consistent Hashing

Partition key routing uses a form of hashing similar to consistent hashing used in distributed caching.

Understanding consistent hashing in caching systems helps grasp how Kafka distributes keys evenly and handles node changes.

Load Balancing in Web Servers

Kafka's partition routing is like load balancers directing requests based on session IDs to keep user sessions sticky.

Knowing how web load balancers maintain session affinity clarifies why Kafka routes messages by key to preserve order.

Postal Sorting Systems

Kafka's partition key routing mirrors how postal systems sort mail by zip codes to group deliveries.

Seeing Kafka routing as mail sorting reveals the importance of grouping related data for efficient processing.

Common Pitfalls

#1Changing partition count without rebalancing keys

Wrong approach:Increasing topic partitions from 3 to 6 without migrating or reassigning keys, expecting order to remain intact.

Correct approach:Plan partition count upfront or use tools to migrate data and rebalance keys when changing partitions.

Root cause:Misunderstanding that partition count affects key-to-partition mapping and order guarantees.

#2Using null keys for messages that require order

Wrong approach:Producing messages without keys but expecting order per user session.

Correct approach:Always include a meaningful key (like user ID) to ensure messages for that user go to the same partition.

Root cause:Not realizing keys are essential for routing and ordering related messages.

#3Inconsistent key serialization across producers

Wrong approach:One producer serializes keys as strings, another as bytes, causing different hash results.

Correct approach:Standardize key serialization format across all producers to ensure consistent routing.

Root cause:Ignoring serialization impact on hashing and partition assignment.

Key Takeaways

Partition keys in Kafka control how messages are routed to partitions, preserving order for related data.

Kafka uses a hash of the key modulo the number of partitions to assign messages deterministically.

Changing the number of partitions can disrupt key routing and message order, so it must be managed carefully.

Custom partitioners allow advanced routing but add complexity and require careful design.

Understanding routing trade-offs helps balance message order guarantees with system scalability and performance.