0
0
Apache Sparkdata~5 mins

SparkSession and SparkContext in Apache Spark - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is SparkContext in Apache Spark?
SparkContext is the entry point to Spark functionality. It connects your program to the Spark cluster and allows you to create RDDs (Resilient Distributed Datasets). Think of it as the main controller for Spark jobs.
Click to reveal answer
beginner
What is SparkSession and why is it important?
SparkSession is the new entry point introduced in Spark 2.0. It combines SQLContext, HiveContext, and SparkContext into one object. It lets you work with DataFrames, SQL, and streaming easily in one place.
Click to reveal answer
intermediate
How do SparkSession and SparkContext relate to each other?
SparkSession internally manages a SparkContext. When you create a SparkSession, it creates or uses an existing SparkContext. You usually use SparkSession now because it handles more features and is simpler.
Click to reveal answer
beginner
How to create a SparkSession in PySpark?
You create a SparkSession using the builder pattern like this:<br><pre>from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('MyApp').getOrCreate()</pre><br>This starts Spark and lets you use DataFrames and SQL.
Click to reveal answer
intermediate
Why should you prefer SparkSession over SparkContext in modern Spark applications?
SparkSession provides a unified interface for all Spark features like SQL, streaming, and machine learning. It simplifies code and manages SparkContext internally, so you don't have to handle multiple contexts separately.
Click to reveal answer
What is the main role of SparkContext in Apache Spark?
AConnect your program to the Spark cluster and create RDDs
BManage SQL queries only
CVisualize data in Spark
DStore data permanently
Which object should you use to work with DataFrames and SQL in Spark 2.0 and later?
ASparkContext
BSQLContext
CSparkSession
DHiveContext
How do you create a SparkSession in PySpark?
Aspark = SparkContext()
Bspark = SparkSession.builder.appName('App').getOrCreate()
Cspark = SQLContext()
Dspark = HiveContext()
What does SparkSession internally manage?
AOnly SQLContext
BNo other context
COnly HiveContext
DSparkContext
Why is SparkSession preferred over SparkContext in modern Spark applications?
AIt provides a unified interface for multiple Spark features
BIt requires less memory
CIt is slower but more reliable
DIt only supports RDDs
Explain the difference between SparkContext and SparkSession and when to use each.
Think about how Spark evolved from SparkContext to SparkSession.
You got /4 concepts.
    Describe how to create a SparkSession in PySpark and why it is important.
    Focus on the code and the benefits of SparkSession.
    You got /4 concepts.