What is schema evolution in kafka

KafkaConceptBeginner · 4 min read

Schema Evolution in Kafka: What It Is and How It Works

Schema evolution in Kafka is the process of safely changing the data format (schema) used in messages over time without breaking existing consumers. It allows producers and consumers to handle new or modified fields by using compatible schema versions registered in a Schema Registry.

⚙️

How It Works

Imagine you have a form that people fill out, and over time you want to add or change some questions without confusing those who already have old forms. Schema evolution in Kafka works similarly by letting you update the structure of your data messages while keeping old and new versions compatible.

Kafka uses a Schema Registry to store different versions of schemas. When a producer sends data, it attaches a schema version. Consumers check this version and understand how to read the data, even if the schema changed. This way, new fields can be added, or some fields can be removed or renamed, as long as the changes follow compatibility rules.

This process ensures that your data pipeline keeps running smoothly even when your data format evolves, avoiding crashes or data loss.

💻

Example

This example shows how to register two versions of a schema in the Schema Registry and produce messages with each version using Avro format.

java

import io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient;
import io.confluent.kafka.schemaregistry.client.SchemaRegistryClient;
import io.confluent.kafka.schemaregistry.avro.AvroSchema;
import org.apache.avro.Schema;

public class SchemaEvolutionExample {
    public static void main(String[] args) throws Exception {
        String schemaRegistryUrl = "http://localhost:8081";
        SchemaRegistryClient client = new CachedSchemaRegistryClient(schemaRegistryUrl, 10);

        // Version 1 schema
        String userSchemaV1 = "{\n" +
                " \"type\": \"record\",\n" +
                " \"name\": \"User\",\n" +
                " \"fields\": [\n" +
                "   {\"name\": \"name\", \"type\": \"string\"}\n" +
                " ]\n" +
                "}";

        // Register version 1
        Schema schemaV1 = new Schema.Parser().parse(userSchemaV1);
        int idV1 = client.register("User-value", new AvroSchema(schemaV1));

        // Version 2 schema with new optional field 'age'
        String userSchemaV2 = "{\n" +
                " \"type\": \"record\",\n" +
                " \"name\": \"User\",\n" +
                " \"fields\": [\n" +
                "   {\"name\": \"name\", \"type\": \"string\"},\n" +
                "   {\"name\": \"age\", \"type\": [\"null\", \"int\"], \"default\": null}\n" +
                " ]\n" +
                "}";

        // Register version 2
        Schema schemaV2 = new Schema.Parser().parse(userSchemaV2);
        int idV2 = client.register("User-value", new AvroSchema(schemaV2));

        System.out.println("Registered schema V1 id: " + idV1);
        System.out.println("Registered schema V2 id: " + idV2);
    }
}

Output

Registered schema V1 id: 1 Registered schema V2 id: 2

🎯

When to Use

Use schema evolution in Kafka when your data format needs to change over time but you want to keep your system running without interruptions. For example:

Adding new fields to messages without breaking old consumers.
Removing or deprecating fields safely.
Changing data types with backward or forward compatibility.

This is common in event-driven systems, microservices, and data pipelines where producers and consumers evolve independently.

✅

Key Points

Schema evolution allows safe changes to message formats in Kafka.
It relies on a Schema Registry to manage schema versions.
Compatibility rules ensure producers and consumers can work together.
Common changes include adding optional fields or default values.
It prevents data processing errors during schema changes.

✅

Key Takeaways

Schema evolution lets Kafka handle changes in data format without breaking consumers.

A Schema Registry stores and manages different schema versions for compatibility.

Use schema evolution to add, remove, or modify fields safely in your Kafka messages.

Compatibility rules (backward, forward) guide how schemas can evolve without errors.

Schema evolution is essential for maintaining stable, flexible data pipelines.