Apache Sparkdata~10 mins

Lazy evaluation in Spark in Apache Spark - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Lazy evaluation in Spark

Define transformations

↓

Build DAG (Directed Acyclic Graph)

↓

Trigger action

↓

Execute DAG

↓

Return results

Spark waits to run computations until an action is called, building a plan (DAG) first, then executing it.

Execution Sample

Apache Spark

rdd = sc.parallelize([1, 2, 3, 4])
mapped = rdd.map(lambda x: x * 2)
filtered = mapped.filter(lambda x: x > 4)
result = filtered.collect()

Defines transformations on an RDD but only runs them when collect() action is called.

Execution Table

Step	Operation	Action Triggered?	DAG State	Execution	Output
1	Create RDD from list [1,2,3,4]	No	DAG with 1 node (source)	No execution	No output
2	Map: multiply each element by 2	No	DAG with 2 nodes (source -> map)	No execution	No output
3	Filter: keep elements > 4	No	DAG with 3 nodes (source -> map -> filter)	No execution	No output
4	Collect action called	Yes	DAG ready	Execute all transformations	[6, 8]

💡 Execution happens only at step 4 when collect() triggers the DAG run

Variable Tracker

Variable	Start	After Step 1	After Step 2	After Step 3	After Step 4
rdd	undefined	RDD with data [1, 2, 3, 4]	RDD with data [1, 2, 3, 4]	RDD with data [1, 2, 3, 4]	RDD with data [1, 2, 3, 4]
mapped	undefined	undefined	RDD with map transformation	RDD with map transformation	RDD with map transformation
filtered	undefined	undefined	undefined	RDD with filter transformation	RDD with filter transformation
result	undefined	undefined	undefined	undefined	[6, 8]

Key Moments - 3 Insights

Why don't the transformations run immediately when defined?

What triggers the actual computation in Spark?

What is the benefit of building a DAG before execution?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table, at which step does Spark actually run the transformations?

AStep 3

BStep 4

CStep 2

DStep 1

Concept Snapshot

Lazy evaluation in Spark means transformations build a plan (DAG) but do not run immediately.
Actions like collect() trigger execution of all transformations.
This allows Spark to optimize and run efficiently.
Transformations are 'lazy'; actions are 'eager'.

Full Transcript

In Spark, when you write code to transform data, Spark does not run those steps right away. Instead, it remembers the steps you want to do and builds a plan called a DAG. This plan shows how data flows through each transformation. Only when you ask for a result with an action like collect(), Spark runs all the steps together. This is called lazy evaluation. It helps Spark run faster by optimizing the whole plan before doing any work. For example, if you create an RDD, map it, and filter it, Spark waits until you call collect() to actually process the data and give you the output.