0
0
Snowflakecloud~10 mins

Data lineage tracking in Snowflake - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - Data lineage tracking
Start: Data Source
Data Ingested into Snowflake
Data Processing / Transformation
Lineage Metadata Captured
Lineage Stored in Metadata Tables
Query Lineage Information
Use Lineage for Impact Analysis / Auditing
Data lineage tracking follows data from its source through processing steps, capturing metadata at each stage to enable tracing and auditing.
Execution Sample
Snowflake
CREATE TABLE raw_sales AS SELECT * FROM external_sales;
CREATE TABLE processed_sales AS SELECT id, amount*1.1 AS amount FROM raw_sales;
-- Query lineage metadata
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.LINEAGE WHERE OBJECT_NAME = 'PROCESSED_SALES';
This code ingests raw data, processes it, and queries Snowflake's lineage metadata to track data flow.
Process Table
StepActionObject AffectedLineage CapturedResult
1Create raw_sales from external_salesraw_salesSource: external_salesraw_sales table created with data from external_sales
2Create processed_sales from raw_salesprocessed_salesSource: raw_salesprocessed_sales table created with transformed data
3Query lineage metadata for processed_saleslineage metadataShows processed_sales depends on raw_sales which depends on external_salesLineage info retrieved
4End--Execution complete
💡 All steps executed; lineage metadata shows data flow from external_sales to processed_sales
Status Tracker
VariableStartAfter Step 1After Step 2Final
raw_salesundefinedTable with external_sales dataTable unchangedTable unchanged
processed_salesundefinedundefinedTable with transformed data from raw_salesTable unchanged
lineage metadataemptyemptypopulated with lineage info for processed_salespopulated
Key Moments - 2 Insights
How does Snowflake know which tables data came from?
Snowflake automatically captures lineage metadata during table creation and transformation steps, as shown in steps 1 and 2 of the execution_table.
Why query the lineage metadata after creating tables?
Querying lineage metadata (step 3) confirms the data flow and dependencies, helping understand impact and audit data origins.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 2, what is the source of processed_sales?
Araw_sales
Bprocessed_sales
Cexternal_sales
Dlineage metadata
💡 Hint
Check the 'Lineage Captured' column at step 2 in execution_table
At which step is the lineage metadata populated with information about processed_sales?
AStep 1
BStep 2
CStep 3
DStep 4
💡 Hint
Look at the 'lineage metadata' variable in variable_tracker after each step
If we skip creating raw_sales and create processed_sales directly from external_sales, how would the lineage change?
ANo lineage would be captured
Bprocessed_sales lineage would show raw_sales as source
Cprocessed_sales lineage would show external_sales as source
DLineage would show processed_sales as source
💡 Hint
Lineage tracks direct data sources; see how lineage changes with table dependencies in execution_table
Concept Snapshot
Data lineage tracking in Snowflake:
- Tracks data flow from source to target tables
- Captured automatically during table creation and transformations
- Stored in ACCOUNT_USAGE.LINEAGE view
- Enables impact analysis and auditing
- Query lineage metadata to understand dependencies
Full Transcript
Data lineage tracking in Snowflake means following data from where it starts to where it ends up. When you create tables or transform data, Snowflake records which tables depend on which. For example, creating a raw_sales table from external_sales captures that raw_sales comes from external_sales. Then creating processed_sales from raw_sales records that dependency. You can query the lineage metadata to see these relationships. This helps you understand where your data came from and what uses it. The execution steps show creating tables and querying lineage. Variables track table states and lineage info. Key moments clarify how lineage is captured and why querying it matters. The quiz tests understanding of lineage sources and metadata timing.