What is SparkSession and SparkContext in Apache Spark?

Apache Sparkdata~5 mins

SparkSession and SparkContext in Apache Spark

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

SparkSession and SparkContext help you start working with Apache Spark. They let you connect to Spark and use its tools to analyze big data easily.

When you want to read data from files like CSV or JSON to analyze it.

When you need to create a Spark program to process large datasets.

When you want to run SQL queries on big data using Spark.

When you want to manage Spark configurations and resources in your program.

Syntax

Apache Spark

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("MyApp") \
    .getOrCreate()

sc = spark.sparkContext

SparkSession is the main entry point to work with Spark SQL and DataFrames.

SparkContext is the older core object to connect to Spark, now accessed via SparkSession.

Examples

Create a SparkSession named 'TestApp' and get its SparkContext.

Apache Spark

spark = SparkSession.builder.appName("TestApp").getOrCreate()
sc = spark.sparkContext

Create a SparkSession running locally on your computer with the name 'LocalApp'.

Apache Spark

spark = SparkSession.builder.master("local").appName("LocalApp").getOrCreate()

Quickly get the SparkContext from a SparkSession.

Apache Spark

sc = SparkSession.builder.getOrCreate().sparkContext

Sample Program

This program creates a SparkSession and SparkContext, prints the app name, then creates and shows a small DataFrame.

Apache Spark

from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder.appName("ExampleApp").getOrCreate()

# Access SparkContext
sc = spark.sparkContext

# Print app name from SparkContext
print(f"App Name: {sc.appName}")

# Create a simple DataFrame
data = [(1, "Alice"), (2, "Bob"), (3, "Cathy")]
columns = ["id", "name"]
df = spark.createDataFrame(data, columns)

# Show DataFrame content
print("DataFrame content:")
df.show()

OutputSuccess

Important Notes

Always create only one SparkSession per application to avoid conflicts.

SparkContext is available inside SparkSession as spark.sparkContext.

Use SparkSession for all new Spark programs; SparkContext is mostly for backward compatibility.

Summary

SparkSession is the main way to work with Spark SQL and DataFrames.

SparkContext connects your program to the Spark cluster and is accessed via SparkSession.

Create SparkSession with builder, set app name and master if needed, then get SparkContext from it.