0
0
Apache Sparkdata~5 mins

SparkSession and SparkContext in Apache Spark

Choose your learning style9 modes available
Introduction

SparkSession and SparkContext help you start working with Apache Spark. They let you connect to Spark and use its tools to analyze big data easily.

When you want to read data from files like CSV or JSON to analyze it.
When you need to create a Spark program to process large datasets.
When you want to run SQL queries on big data using Spark.
When you want to manage Spark configurations and resources in your program.
Syntax
Apache Spark
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("MyApp") \
    .getOrCreate()

sc = spark.sparkContext

SparkSession is the main entry point to work with Spark SQL and DataFrames.

SparkContext is the older core object to connect to Spark, now accessed via SparkSession.

Examples
Create a SparkSession named 'TestApp' and get its SparkContext.
Apache Spark
spark = SparkSession.builder.appName("TestApp").getOrCreate()
sc = spark.sparkContext
Create a SparkSession running locally on your computer with the name 'LocalApp'.
Apache Spark
spark = SparkSession.builder.master("local").appName("LocalApp").getOrCreate()
Quickly get the SparkContext from a SparkSession.
Apache Spark
sc = SparkSession.builder.getOrCreate().sparkContext
Sample Program

This program creates a SparkSession and SparkContext, prints the app name, then creates and shows a small DataFrame.

Apache Spark
from pyspark.sql import SparkSession

# Create SparkSession
spark = SparkSession.builder.appName("ExampleApp").getOrCreate()

# Access SparkContext
sc = spark.sparkContext

# Print app name from SparkContext
print(f"App Name: {sc.appName}")

# Create a simple DataFrame
data = [(1, "Alice"), (2, "Bob"), (3, "Cathy")]
columns = ["id", "name"]
df = spark.createDataFrame(data, columns)

# Show DataFrame content
print("DataFrame content:")
df.show()
OutputSuccess
Important Notes

Always create only one SparkSession per application to avoid conflicts.

SparkContext is available inside SparkSession as spark.sparkContext.

Use SparkSession for all new Spark programs; SparkContext is mostly for backward compatibility.

Summary

SparkSession is the main way to work with Spark SQL and DataFrames.

SparkContext connects your program to the Spark cluster and is accessed via SparkSession.

Create SparkSession with builder, set app name and master if needed, then get SparkContext from it.