Overview - Event streaming concept

What is it?

Event streaming is a way to send and receive data continuously as it happens. It treats data as a series of events, like messages, that flow through a system in real time. This allows applications to react quickly to new information without waiting for batch updates. Apache Kafka is a popular tool that helps manage and process these event streams efficiently.

Why it matters

Without event streaming, systems often rely on slow batch processing that delays important updates and decisions. Event streaming solves this by enabling instant data flow, which is crucial for things like fraud detection, live analytics, and real-time user experiences. It makes systems more responsive and scalable, improving how businesses operate and serve customers.

Where it fits

Before learning event streaming, you should understand basic messaging systems and data flow concepts. After mastering event streaming, you can explore stream processing frameworks, real-time analytics, and event-driven architectures to build complex, reactive systems.

Mental Model

Core Idea

Event streaming is like a continuous river of data events flowing through a system, allowing real-time processing and reaction.

Think of it like...

Imagine a conveyor belt in a factory carrying packages one after another. Each package is an event with information. Workers (applications) pick up packages as they pass by and act immediately, instead of waiting for all packages to arrive before starting work.

┌───────────────┐    ┌───────────────┐    ┌───────────────┐
│ Event Source  │───▶│ Event Stream  │───▶│ Event Consumer│
└───────────────┘    └───────────────┘    └───────────────┘
       │                    │                    │
       │ Continuous flow of events (messages)    │
       └────────────────────────────────────────┘

Build-Up - 7 Steps

1

FoundationUnderstanding events and streams

Concept: Learn what events are and how they form streams of data.

An event is a record of something that happened, like a user clicking a button or a sensor reading. When many events happen over time, they form a stream, which is like a timeline of these events flowing continuously.

Result

You can now identify events as individual data points and streams as their continuous flow.

Understanding that data can be seen as a sequence of events flowing over time is the foundation of event streaming.

2

FoundationBasics of message brokers

3

IntermediateKafka topics and partitions

4

IntermediateProducers and consumers roles

5

IntermediateEvent ordering and delivery guarantees

6

AdvancedStream processing with Kafka

7

ExpertHandling failures and scaling in Kafka

Under the Hood

Kafka stores events as immutable logs on disk, organized by topic and partition. Producers append events to partition logs sequentially. Consumers track their position (offset) in each partition to read events in order. Kafka replicates partitions across brokers for fault tolerance. Leader brokers coordinate writes and reads, while followers replicate data asynchronously. This design allows high throughput, durability, and scalability.

Why designed this way?

Kafka was designed to handle massive data streams with low latency and high reliability. Using append-only logs simplifies storage and recovery. Partitioning enables parallelism and scaling. Replication ensures no data loss during failures. Alternatives like traditional message queues lacked Kafka's combination of durability, scalability, and replayability, which were critical for modern data pipelines.

┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│   Producer    │──────▶│ Partition Log │──────▶│   Consumer    │
└───────────────┘       └───────────────┘       └───────────────┘
         │                      ▲   ▲                      │
         │                      │   │                      │
         │                ┌─────┘   └─────┐                │
         │                │ Replication  │                │
         ▼                ▼              ▼                ▼
┌───────────────┐ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐
│ Partition Log │ │ Partition Log │ │ Partition Log │ │ Partition Log │
│  Replica 1    │ │  Replica 2    │ │  Replica 3    │ │  Replica N    │
└───────────────┘ └───────────────┘ └───────────────┘ └───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Do you think Kafka deletes events immediately after consumers read them? Commit to yes or no.

Common Belief:Kafka removes events as soon as a consumer reads them to save space.

Tap to reveal reality

Quick: Do you think Kafka guarantees global ordering of all events across topics? Commit to yes or no.

Common Belief:Kafka ensures all events across all topics are strictly ordered.

Tap to reveal reality

Quick: Do you think event streaming is only useful for big companies with huge data? Commit to yes or no.

Common Belief:Event streaming is only for large enterprises with massive data volumes.

Tap to reveal reality

Quick: Do you think consumers must be always online to receive events? Commit to yes or no.

Common Belief:Consumers must be connected all the time to get events; otherwise, they miss them.

Tap to reveal reality

Expert Zone

1

Kafka's offset management allows consumers to control exactly which events they process, enabling complex retry and error handling strategies.

2

The choice of partition key affects load balancing and ordering guarantees, impacting system performance and correctness.

3

Kafka's log compaction feature allows retaining only the latest value per key, useful for stateful stream processing and reducing storage.

When NOT to use

Event streaming is not ideal for simple request-response APIs or batch-only workflows. In such cases, traditional REST or batch ETL tools are simpler and more efficient.

Production Patterns

In production, Kafka is often used with schema registries to enforce data formats, with monitoring tools to track lag and throughput, and with stream processing frameworks to build event-driven microservices and real-time analytics pipelines.

Connections

Publish-Subscribe Messaging

Event streaming builds on and extends pub-sub by adding durable storage and replay capabilities.

Understanding pub-sub helps grasp how event streaming decouples producers and consumers but adds persistence and scalability.

Database Change Data Capture (CDC)

CDC captures database changes as event streams for real-time integration.

Knowing CDC shows how event streaming can keep systems synchronized by streaming data changes continuously.

Supply Chain Logistics

Both involve continuous flow and tracking of items (events or goods) through stages.

Seeing event streaming like supply chain logistics highlights the importance of order, tracking, and fault tolerance in moving data.

Common Pitfalls

#1Assuming events are deleted after consumption causing data loss.

Wrong approach:Kafka deletes messages immediately after a consumer reads them.

Correct approach:Kafka retains messages based on retention policy, allowing multiple consumers and replay.

Root cause:Misunderstanding Kafka's retention model and thinking it behaves like a traditional queue.

#2Using a single partition for all events limiting scalability.

Wrong approach:Creating a Kafka topic with only one partition for all data.

Correct approach:Designing topics with multiple partitions to enable parallel processing and higher throughput.

Root cause:Not realizing partitions enable Kafka's horizontal scaling.

#3Ignoring consumer offset management leading to duplicate processing.

Wrong approach:Consumers do not commit offsets or commit them incorrectly, causing reprocessing or data loss.

Correct approach:Properly managing offsets to track processed events and ensure exactly-once or at-least-once processing.

Root cause:Lack of understanding of consumer state and offset semantics.

Key Takeaways

Event streaming treats data as a continuous flow of events, enabling real-time processing and responsiveness.

Kafka organizes events into topics and partitions to scale and maintain order within partitions.

Producers send events, and consumers read them independently, with Kafka storing events for replay and fault tolerance.

Understanding Kafka's internal replication and offset management is key to building reliable, scalable event-driven systems.

Event streaming is broadly useful beyond big companies and supports complex real-time applications when designed carefully.