0
0
Apache Sparkdata~10 mins

SparkSession and SparkContext in Apache Spark - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - SparkSession and SparkContext
Start Application
Create SparkSession
SparkSession creates SparkContext
Use SparkContext for low-level operations
Use SparkSession for high-level APIs
Perform Data Processing
Stop SparkSession (and SparkContext)
The flow shows starting a Spark app by creating a SparkSession, which internally creates a SparkContext. SparkContext handles core tasks, while SparkSession provides easy access to high-level features.
Execution Sample
Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("TestApp").getOrCreate()
sc = spark.sparkContext
print(sc.appName)
spark.stop()
This code creates a SparkSession, accesses its SparkContext, prints the app name, then stops the session.
Execution Table
StepActionObject Created/AccessedState/ValueOutput
1Import SparkSessionSparkSession classAvailable
2Create SparkSession with appName 'TestApp'SparkSession instancespark created
3SparkSession creates SparkContext internallySparkContext instancesc created with appName 'TestApp'
4Access SparkContext from SparkSessionscReference to SparkContext
5Print sc.appNamesc.appNameTestAppTestApp
6Stop SparkSessionsparkspark and sc stopped
7End of programExecution complete
💡 SparkSession stopped, releasing SparkContext and ending the application.
Variable Tracker
VariableStartAfter Step 2After Step 3After Step 4After Step 6
sparkNoneSparkSession instanceSparkSession instanceSparkSession instanceStopped
scNoneNoneSparkContext instanceSparkContext instanceStopped
Key Moments - 2 Insights
Why do we access SparkContext from SparkSession instead of creating it directly?
SparkSession manages SparkContext internally to simplify setup. As shown in execution_table step 3, SparkContext is created automatically when SparkSession is created.
What happens when we stop the SparkSession?
Stopping SparkSession also stops SparkContext, releasing resources. See execution_table step 6 where spark.stop() stops both.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 5, what is printed?
A"TestApp"
B"spark"
C"sc"
D"None"
💡 Hint
Check the 'Output' column at step 5 in execution_table.
At which step is the SparkContext created?
AStep 2
BStep 3
CStep 4
DStep 6
💡 Hint
Look at the 'Action' and 'Object Created/Accessed' columns in execution_table.
If we do not call spark.stop(), what happens to SparkContext?
AIt stops automatically at program end
BIt is deleted immediately
CIt remains running and holds resources
DIt throws an error
💡 Hint
Refer to key_moments about stopping SparkSession and resource release.
Concept Snapshot
SparkSession is the main entry point to Spark.
It creates and manages SparkContext internally.
Use SparkSession for high-level APIs like DataFrames.
Use SparkContext for low-level RDD operations.
Always stop SparkSession to free resources.
Full Transcript
This visual execution shows how a Spark application starts by creating a SparkSession. The SparkSession internally creates a SparkContext, which handles core Spark operations. We access SparkContext from SparkSession to perform low-level tasks. The example code creates a SparkSession named 'TestApp', accesses its SparkContext, prints the application name, and then stops the session. Stopping SparkSession also stops SparkContext, releasing resources. The execution table traces each step, showing object creation and state changes. Key moments clarify why SparkContext is accessed via SparkSession and what stopping the session does. The quiz tests understanding of these steps and resource management.