Apache Sparkdata~10 mins

SparkSession and SparkContext in Apache Spark - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - SparkSession and SparkContext

Start Application

↓

Create SparkSession

↓

SparkSession creates SparkContext

↓

Use SparkContext for low-level operations

↓

Use SparkSession for high-level APIs

↓

Perform Data Processing

↓

Stop SparkSession (and SparkContext)

The flow shows starting a Spark app by creating a SparkSession, which internally creates a SparkContext. SparkContext handles core tasks, while SparkSession provides easy access to high-level features.

Execution Sample

Apache Spark

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("TestApp").getOrCreate()
sc = spark.sparkContext
print(sc.appName)
spark.stop()

This code creates a SparkSession, accesses its SparkContext, prints the app name, then stops the session.

Execution Table

Step	Action	Object Created/Accessed	State/Value	Output
1	Import SparkSession	SparkSession class	Available
2	Create SparkSession with appName 'TestApp'	SparkSession instance	spark created
3	SparkSession creates SparkContext internally	SparkContext instance	sc created with appName 'TestApp'
4	Access SparkContext from SparkSession	sc	Reference to SparkContext
5	Print sc.appName	sc.appName	TestApp	TestApp
6	Stop SparkSession	spark	spark and sc stopped
7	End of program			Execution complete

💡 SparkSession stopped, releasing SparkContext and ending the application.

Variable Tracker

Variable	Start	After Step 2	After Step 3	After Step 4	After Step 6
spark	None	SparkSession instance	SparkSession instance	SparkSession instance	Stopped
sc	None	None	SparkContext instance	SparkContext instance	Stopped

Key Moments - 2 Insights

Why do we access SparkContext from SparkSession instead of creating it directly?

What happens when we stop the SparkSession?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table at step 5, what is printed?

A"TestApp"

B"spark"

C"sc"

D"None"

Concept Snapshot

SparkSession is the main entry point to Spark.
It creates and manages SparkContext internally.
Use SparkSession for high-level APIs like DataFrames.
Use SparkContext for low-level RDD operations.
Always stop SparkSession to free resources.

Full Transcript

This visual execution shows how a Spark application starts by creating a SparkSession. The SparkSession internally creates a SparkContext, which handles core Spark operations. We access SparkContext from SparkSession to perform low-level tasks. The example code creates a SparkSession named 'TestApp', accesses its SparkContext, prints the application name, and then stops the session. Stopping SparkSession also stops SparkContext, releasing resources. The execution table traces each step, showing object creation and state changes. Key moments clarify why SparkContext is accessed via SparkSession and what stopping the session does. The quiz tests understanding of these steps and resource management.