Apache Sparkdata~10 mins

Spark architecture (driver, executors, cluster manager) in Apache Spark - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Spark architecture (driver, executors, cluster manager)

User submits Spark job

↓

Driver program starts

↓

Driver requests resources from Cluster Manager

↓

Cluster Manager allocates Executors

↓

Executors run tasks

↓

Executors send results back to Driver

↓

Driver collects results and completes job

This flow shows how a Spark job moves from user submission through the driver, cluster manager, executors, and back to the driver for results.

Execution Sample

Apache Spark

# Pseudocode for Spark job execution
spark = SparkSession.builder.appName('Example').getOrCreate()
data = spark.read.csv('data.csv')
result = data.filter("cast(_c1 as int) > 10").count()
print(result)

This code reads data, filters rows where second column > 10, counts them, and prints the result.

Execution Table

Step	Component	Action	Details	Output
1	User	Submits job	Job with filter and count	Job sent to Driver
2	Driver	Starts	Initializes SparkContext and DAG	Ready to schedule tasks
3	Driver	Requests resources	Asks Cluster Manager for executors	Resource request sent
4	Cluster Manager	Allocates executors	Assigns executors on worker nodes	Executors launched
5	Executors	Run tasks	Filter and count tasks executed on data partitions	Partial counts computed
6	Executors	Send results	Partial counts sent back to Driver	Partial results received
7	Driver	Aggregates results	Sums partial counts	Final count computed
8	Driver	Job complete	Prints final count	Output displayed to user
9	-	Exit	Job finished successfully	-

💡 Job completes after driver aggregates results and prints output

Variable Tracker

Variable	Start	After Step 5	After Step 7	Final
spark	None	SparkSession active	SparkSession active	SparkSession active
data	None	Data loaded as DataFrame	DataFrame unchanged	DataFrame unchanged
partial_counts	None	List of counts from executors	Aggregated sum	Aggregated sum
final_count	None	None	Sum of partial counts	Printed output

Key Moments - 3 Insights

Why does the driver request resources from the cluster manager before running tasks?

What is the role of executors in Spark architecture?

Why does the driver aggregate results after executors finish tasks?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table, at which step does the cluster manager allocate executors?

AStep 2

BStep 4

CStep 6

DStep 8

Concept Snapshot

Spark architecture overview:
- Driver: coordinates job, creates tasks
- Cluster Manager: allocates executors
- Executors: run tasks on data partitions
- Data flows from user to driver, then executors, back to driver
- Driver aggregates results and completes job

Full Transcript

In Spark architecture, the user submits a job that starts the driver program. The driver initializes and requests resources from the cluster manager. The cluster manager allocates executors on worker nodes. Executors run tasks on data partitions and send partial results back to the driver. The driver aggregates these results to produce the final output and completes the job. Variables like the SparkSession, data, partial counts, and final count change state through these steps. Key points include the driver's role in resource requests and result aggregation, and executors' role in task execution.