SparkSession and SparkContext help you start working with Apache Spark. They let you connect to Spark and use its tools to analyze big data easily.
SparkSession and SparkContext in Apache Spark
from pyspark.sql import SparkSession spark = SparkSession.builder \ .appName("MyApp") \ .getOrCreate() sc = spark.sparkContext
SparkSession is the main entry point to work with Spark SQL and DataFrames.
SparkContext is the older core object to connect to Spark, now accessed via SparkSession.
spark = SparkSession.builder.appName("TestApp").getOrCreate()
sc = spark.sparkContextspark = SparkSession.builder.master("local").appName("LocalApp").getOrCreate()
sc = SparkSession.builder.getOrCreate().sparkContext
This program creates a SparkSession and SparkContext, prints the app name, then creates and shows a small DataFrame.
from pyspark.sql import SparkSession # Create SparkSession spark = SparkSession.builder.appName("ExampleApp").getOrCreate() # Access SparkContext sc = spark.sparkContext # Print app name from SparkContext print(f"App Name: {sc.appName}") # Create a simple DataFrame data = [(1, "Alice"), (2, "Bob"), (3, "Cathy")] columns = ["id", "name"] df = spark.createDataFrame(data, columns) # Show DataFrame content print("DataFrame content:") df.show()
Always create only one SparkSession per application to avoid conflicts.
SparkContext is available inside SparkSession as spark.sparkContext.
Use SparkSession for all new Spark programs; SparkContext is mostly for backward compatibility.
SparkSession is the main way to work with Spark SQL and DataFrames.
SparkContext connects your program to the Spark cluster and is accessed via SparkSession.
Create SparkSession with builder, set app name and master if needed, then get SparkContext from it.