Avro vs protobuf vs json schema kafka

KafkaComparisonIntermediate · 4 min read

Avro vs Protobuf vs JSON Schema in Kafka: Key Differences and Usage

In Kafka, Avro, Protobuf, and JSON Schema are popular serialization formats used with Schema Registry to enforce data structure. Avro is compact and schema-evolution friendly, Protobuf offers high performance and strict typing, while JSON Schema is human-readable and flexible but less compact. Choose based on your need for speed, readability, and schema evolution support.

⚖️

Quick Comparison

Here is a quick overview comparing Avro, Protobuf, and JSON Schema for Kafka serialization.

Feature	Avro	Protobuf	JSON Schema
Data Format	Binary (compact)	Binary (very compact)	Text (JSON)
Schema Evolution	Strong support with defaults	Strong support, requires careful versioning	Flexible but less strict
Readability	Low (binary)	Low (binary)	High (human-readable JSON)
Performance	Fast serialization/deserialization	Faster serialization/deserialization	Slower due to text parsing
Typing	Dynamic with schema	Strongly typed	Dynamic, loosely typed
Integration	Widely used with Confluent Schema Registry	Supported by Schema Registry	Supported by Schema Registry

⚖️

Key Differences

Avro uses a compact binary format with schemas stored separately in a Schema Registry. It supports schema evolution well by allowing default values and field additions without breaking consumers. This makes it popular for Kafka where data changes over time.

Protobuf also uses a compact binary format but enforces strict typing and requires explicit field numbering. It offers very fast serialization and deserialization, making it ideal for performance-critical Kafka streams. However, schema evolution needs careful management to avoid breaking changes.

JSON Schema uses human-readable JSON text format, which is easy to debug and flexible. It is less compact and slower to process compared to Avro and Protobuf. It is useful when readability and flexibility are more important than performance, but schema evolution is less strict and can lead to inconsistencies.

💻

Avro Code Example

java

import org.apache.avro.Schema;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import java.io.ByteArrayOutputStream;

public class AvroExample {
    public static void main(String[] args) throws Exception {
        String schemaJson = "{\"type\":\"record\",\"name\":\"User\",\"fields\":[{\"name\":\"name\",\"type\":\"string\"},{\"name\":\"age\",\"type\":\"int\"}]}";
        Schema schema = new Schema.Parser().parse(schemaJson);

        GenericRecord user = new GenericData.Record(schema);
        user.put("name", "Alice");
        user.put("age", 30);

        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DatumWriter<GenericRecord> writer = new GenericData().createDatumWriter(schema);
        Encoder encoder = EncoderFactory.get().binaryEncoder(out, null);
        writer.write(user, encoder);
        encoder.flush();
        out.close();

        byte[] serializedBytes = out.toByteArray();
        System.out.println("Serialized Avro bytes length: " + serializedBytes.length);
    }
}

Output

Serialized Avro bytes length: 14

↔️

Protobuf Equivalent

java

syntax = "proto3";

message User {
  string name = 1;
  int32 age = 2;
}

// Java code to serialize
import com.example.UserOuterClass.User;

public class ProtobufExample {
    public static void main(String[] args) {
        User user = User.newBuilder()
            .setName("Alice")
            .setAge(30)
            .build();

        byte[] serializedBytes = user.toByteArray();
        System.out.println("Serialized Protobuf bytes length: " + serializedBytes.length);
    }
}

Output

Serialized Protobuf bytes length: 8

🎯

When to Use Which

Choose Avro when you want good schema evolution support with compact binary format and wide Kafka ecosystem integration.

Choose Protobuf if you need the fastest serialization with strict typing and can manage schema versions carefully.

Choose JSON Schema when human readability and flexibility are more important than performance, such as in development or debugging phases.

✅

Key Takeaways

Avro offers a good balance of compactness and schema evolution for Kafka data.

Protobuf provides the fastest serialization with strict typing but needs careful schema management.

JSON Schema is human-readable but slower and less compact, best for flexibility and debugging.

All three integrate with Kafka Schema Registry for schema enforcement.

Pick based on your priorities: performance, readability, or schema evolution.