Overview - Streams for change data capture

What is it?

Streams in Snowflake are a way to track changes made to a table over time. They record inserts, updates, and deletes so you can see what changed since the last time you checked. This helps you capture data changes without scanning the entire table again. Streams make it easier to build processes that react only to new or changed data.

Why it matters

Without streams, you would have to scan entire tables repeatedly to find what changed, which wastes time and computing power. Streams let you focus only on new or updated data, making data processing faster and cheaper. This is important for keeping data fresh in reports, syncing systems, or triggering actions based on changes.

Where it fits

Before learning streams, you should understand basic Snowflake tables and SQL queries. After streams, you can learn about tasks and pipes to automate processing of changed data. Streams fit into the data pipeline as the change detector that feeds downstream processes.

Mental Model

Core Idea

A stream is like a bookmark that remembers where you left off reading changes in a table, so you only see new changes each time.

Think of it like...

Imagine reading a newspaper every day. Instead of reading the whole paper again, you mark the last article you read. The next day, you start reading only the new articles published since your bookmark. Streams work the same way for data changes.

┌───────────────┐       ┌───────────────┐
│   Table Data  │──────▶│   Stream      │
│ (full records)│       │ (records only │
│               │       │  changes)     │
└───────────────┘       └───────────────┘
         ▲                      │
         │                      ▼
   Data changes          Query stream for
   (insert/update/      new changes since
    delete)             last read position

Build-Up - 6 Steps

1

FoundationWhat is a Snowflake Stream

Concept: Introduce the basic idea of a stream as a change tracker on a table.

A Snowflake stream is an object that tracks changes made to a table. When you create a stream on a table, Snowflake records every insert, update, and delete that happens. The stream does not store the full data but keeps a record of what changed since the last time you read from it.

Result

You get a way to query only the new or changed rows without scanning the whole table.

Understanding that streams track changes incrementally helps you avoid expensive full table scans.

2

FoundationTypes of Streams in Snowflake

3

IntermediateHow to Create and Query a Stream

4

IntermediateUsing Streams for Incremental Data Processing

5

AdvancedHandling Stream Offsets and Data Consistency

6

ExpertStreams with Tasks and Pipes for Automation

Under the Hood

Internally, Snowflake streams track changes by recording metadata about row-level operations in a change table linked to the source table. Each stream maintains an offset pointer marking the last consumed change. When queried, the stream returns changes after this offset and advances it. This avoids scanning the full table and stores only delta information efficiently.

Why designed this way?

Streams were designed to provide efficient, incremental change capture without duplicating full data or requiring complex triggers. Snowflake's architecture separates storage and compute, so streams leverage metadata tracking to minimize compute costs and latency. Alternatives like triggers or full table scans were too costly or complex.

┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│   Source      │      │  Change Table │      │   Stream      │
│   Table       │─────▶│  (metadata)   │─────▶│  Offset Track │
│ (full data)   │      │               │      │  & Query API  │
└───────────────┘      └───────────────┘      └───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does querying a stream return the full table data or only changes? Commit to your answer.

Common Belief:Querying a stream returns the entire table data like a normal SELECT.

Tap to reveal reality

Quick: Do streams store full copies of changed rows or just metadata? Commit to your answer.

Common Belief:Streams store full copies of all changed rows, duplicating data.

Tap to reveal reality

Quick: Does reading from a stream leave changes available for next reads? Commit to your answer.

Common Belief:Reading from a stream does not consume changes; they remain available indefinitely.

Tap to reveal reality

Quick: Can streams detect changes in external tables or views? Commit to your answer.

Common Belief:Streams can track changes on any table, including external tables and views.

Tap to reveal reality

Expert Zone

1

Streams do not store data themselves but rely on Snowflake's underlying micro-partition metadata to track changes efficiently.

2

Offset management in streams is crucial for exactly-once processing; careless offset resets can cause data loss or duplication.

3

Streams combined with tasks enable event-driven architectures inside Snowflake without external orchestration tools.

When NOT to use

Streams are not suitable when you need to capture changes from external systems or non-Snowflake tables. In such cases, use external CDC tools or Snowflake's Snowpipe for continuous data ingestion.

Production Patterns

In production, streams are often paired with scheduled tasks that read changes and merge them into target tables. This pattern supports incremental ETL pipelines, real-time analytics, and data synchronization across systems.

Connections

Event Sourcing

Streams implement a similar pattern by recording changes as events that can be replayed or processed incrementally.

Understanding streams as event logs helps grasp how state changes can be tracked and rebuilt over time.

Message Queues

Streams act like a message queue for database changes, delivering change events to consumers in order.

Seeing streams as queues clarifies their role in decoupling data producers and consumers in pipelines.

Version Control Systems

Streams track changes over time similar to how version control tracks code changes with commits and diffs.

This connection helps appreciate the importance of offsets as checkpoints to avoid reprocessing or missing changes.

Common Pitfalls

#1Querying the stream multiple times without consuming changes causes duplicate processing.

Wrong approach:SELECT * FROM my_stream; SELECT * FROM my_stream; -- runs again without offset advance

Correct approach:SELECT * FROM my_stream; -- consumes changes -- wait for new changes before next query

Root cause:Not understanding that querying advances the stream offset and that repeated queries without new changes return empty results.

#2Creating a stream on a view or external table which is unsupported.

Wrong approach:CREATE STREAM my_stream ON VIEW my_view;

Correct approach:CREATE STREAM my_stream ON TABLE my_table;

Root cause:Misunderstanding that streams only work on base tables inside Snowflake.

#3Resetting stream offset incorrectly causing loss of unprocessed changes.

Wrong approach:ALTER STREAM my_stream SET OFFSET = CURRENT_TIMESTAMP;

Correct approach:Use careful offset management or let Snowflake handle offsets automatically by consuming changes.

Root cause:Lack of knowledge about offset semantics and risks of manual offset manipulation.

Key Takeaways

Streams in Snowflake track only the changes made to tables, enabling efficient incremental data processing.

They maintain an internal offset to remember which changes have been consumed, preventing duplicate processing.

Choosing the right stream type and managing offsets carefully is essential for reliable change data capture.

Streams integrate with tasks and pipes to automate continuous data pipelines inside Snowflake.

Understanding streams as change trackers rather than full data stores helps design scalable and cost-effective data workflows.