Overview - ZooKeeper role (and KRaft replacement)

What is it?

ZooKeeper is a system that helps manage and coordinate Kafka servers by keeping track of their status and configuration. It acts like a central manager that ensures all Kafka servers agree on who is the leader and how data is organized. KRaft is a newer way Kafka uses to replace ZooKeeper by handling this coordination internally without needing an extra system. This change simplifies Kafka's setup and improves its performance.

Why it matters

Without ZooKeeper or a similar system, Kafka servers would struggle to agree on important decisions like who leads data handling or how to keep data consistent. This could cause confusion, data loss, or downtime. KRaft removes the need for ZooKeeper, making Kafka easier to run and more reliable, which means smoother data streaming for applications that depend on it.

Where it fits

Before learning about ZooKeeper and KRaft, you should understand basic Kafka concepts like brokers, topics, and partitions. After this, you can explore Kafka cluster management, fault tolerance, and how Kafka ensures data consistency and availability.

Mental Model

Core Idea

ZooKeeper (or KRaft) acts as the trusted referee that helps Kafka servers agree on who leads and how data is managed to keep the system running smoothly.

Think of it like...

Imagine a group project where one person is the team leader who assigns tasks and keeps everyone on the same page. ZooKeeper is like the teacher who oversees the group, making sure everyone agrees on who the leader is and what each person should do. KRaft is like the group deciding to manage leadership and coordination themselves without needing the teacher.

┌───────────────┐       ┌───────────────┐
│   Kafka       │       │   ZooKeeper   │
│   Brokers     │◄─────►│   Cluster     │
│ (Servers)     │       │ Coordination  │
└───────────────┘       └───────────────┘
          ▲                      ▲
          │                      │
          │                      │
          ▼                      ▼
   Leader Election         Configuration
   & Metadata Sync        Management

In KRaft mode:
┌───────────────┐
│ Kafka Brokers │
│ (Self-Managed │
│  Coordination)│
└───────────────┘

Build-Up - 7 Steps

1

FoundationWhat is ZooKeeper in Kafka

Concept: ZooKeeper is introduced as a separate system that Kafka uses to manage its cluster state and coordination.

Kafka brokers need to know who is the leader for each data partition and keep track of cluster membership. ZooKeeper is a distributed service that stores this metadata and helps brokers coordinate by electing leaders and sharing configuration.

Result

Kafka brokers can work together reliably because they use ZooKeeper to agree on leadership and cluster state.

Understanding ZooKeeper's role is key to grasping how Kafka maintains order and consistency across multiple servers.

2

FoundationWhy Kafka needs coordination

3

IntermediateHow ZooKeeper manages Kafka cluster state

4

IntermediateLimitations of ZooKeeper in Kafka

5

IntermediateIntroduction to KRaft mode in Kafka

6

AdvancedHow KRaft manages metadata internally

7

ExpertChallenges and tradeoffs in KRaft design

Under the Hood

ZooKeeper works as a separate distributed system that stores small pieces of metadata in a hierarchical namespace. It uses a consensus protocol called Zab to ensure all nodes agree on updates. Kafka brokers connect to ZooKeeper to read and write metadata, perform leader elections, and watch for changes. KRaft replaces ZooKeeper by embedding a Raft consensus group inside Kafka brokers. This group replicates metadata logs and manages leader election internally, removing the need for an external system.

Why designed this way?

ZooKeeper was chosen originally because it was a proven, reliable coordination service that could be reused across many systems. However, running a separate ZooKeeper cluster added operational overhead. Kafka's designers created KRaft to simplify architecture by integrating coordination directly, reducing dependencies and improving scalability. Raft was chosen for its understandability and strong consistency guarantees.

ZooKeeper Mode:
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ Kafka Broker 1│◄─────►│               │◄─────►│ Kafka Broker 2│
│               │       │  ZooKeeper    │       │               │
│               │       │  Ensemble     │       │               │
└───────────────┘       └───────────────┘       └───────────────┘

KRaft Mode:
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ Kafka Broker 1│◄─────►│ Kafka Broker 2│◄─────►│ Kafka Broker 3│
│ (KRaft Node)  │       │ (KRaft Node)  │       │ (KRaft Node)  │
└───────────────┘       └───────────────┘       └───────────────┘

All brokers participate in metadata consensus internally.

Myth Busters - 4 Common Misconceptions

Quick: Does ZooKeeper store Kafka's actual message data? Commit to yes or no.

Common Belief:ZooKeeper stores all Kafka data including messages.

Tap to reveal reality

Quick: Is KRaft just an add-on that works alongside ZooKeeper? Commit to yes or no.

Common Belief:KRaft works together with ZooKeeper to manage Kafka clusters.

Tap to reveal reality

Quick: Does removing ZooKeeper make Kafka's internal coordination simpler? Commit to yes or no.

Common Belief:Removing ZooKeeper simplifies Kafka's internal architecture completely.

Tap to reveal reality

Quick: Can ZooKeeper scale infinitely with Kafka clusters without issues? Commit to yes or no.

Common Belief:ZooKeeper scales easily and is not a bottleneck for Kafka clusters.

Tap to reveal reality

Expert Zone

1

KRaft requires a quorum of brokers to be available for metadata operations, so cluster availability depends on quorum health.

2

ZooKeeper's separate cluster allows independent scaling and tuning, which can be advantageous in some complex deployments.

3

KRaft's metadata log is append-only and immutable, enabling easier recovery and auditability compared to ZooKeeper's znode state.

When NOT to use

KRaft is not suitable for Kafka versions before 3.3 or for clusters requiring legacy ZooKeeper features. In very large or complex environments, some operators may prefer ZooKeeper for independent scaling and mature tooling.

Production Patterns

Many production Kafka clusters now run in KRaft mode to reduce operational overhead. Operators use multi-node KRaft quorum setups for fault tolerance and monitor metadata logs closely. Migration from ZooKeeper to KRaft is planned carefully to avoid downtime.

Connections

Distributed Consensus Algorithms

KRaft uses the Raft consensus algorithm, a core distributed consensus method.

Understanding Raft helps grasp how Kafka achieves reliable coordination without external systems.

Leader Election in Distributed Systems

ZooKeeper and KRaft both perform leader election to decide which node manages data partitions.

Knowing leader election principles clarifies how Kafka maintains availability and consistency.

Project Management Coordination

ZooKeeper's role is like a project manager coordinating team members to avoid conflicts.

Seeing coordination as a human process helps understand why distributed systems need consensus.

Common Pitfalls

#1Trying to run Kafka without ZooKeeper or KRaft enabled.

Wrong approach:bin/kafka-server-start.sh config/server.properties # No KRaft mode enabled and no ZooKeeper configured

Correct approach:bin/kafka-server-start.sh config/kraft/server.properties # KRaft mode enabled with proper configuration

Root cause:Misunderstanding that Kafka requires either ZooKeeper or KRaft for cluster coordination.

#2Assuming ZooKeeper stores Kafka messages leading to wrong backup strategies.

Wrong approach:Backing up only ZooKeeper data to recover Kafka messages.

Correct approach:Backing up Kafka log directories where actual messages are stored.

Root cause:Confusing metadata storage with message storage.

#3Mixing ZooKeeper and KRaft configurations in the same Kafka cluster.

Wrong approach:Configuring some brokers with ZooKeeper and others with KRaft in the same cluster.

Correct approach:Using either ZooKeeper mode or KRaft mode consistently across all brokers.

Root cause:Not understanding that ZooKeeper and KRaft modes are mutually exclusive.

Key Takeaways

ZooKeeper is a separate system Kafka originally used to manage cluster metadata and coordination.

Kafka needs coordination to elect leaders and keep cluster state consistent for reliable data streaming.

KRaft is Kafka's built-in replacement for ZooKeeper that simplifies deployment by handling coordination internally.

KRaft uses the Raft consensus algorithm to replicate metadata across brokers without external dependencies.

Understanding the tradeoffs between ZooKeeper and KRaft helps operate Kafka clusters effectively and plan migrations.