Recall & Review
beginner
What is SparkContext in Apache Spark?
SparkContext is the entry point to Spark functionality. It connects your program to the Spark cluster and allows you to create RDDs (Resilient Distributed Datasets). Think of it as the main controller for Spark jobs.
Click to reveal answer
beginner
What is SparkSession and why is it important?
SparkSession is the new entry point introduced in Spark 2.0. It combines SQLContext, HiveContext, and SparkContext into one object. It lets you work with DataFrames, SQL, and streaming easily in one place.
Click to reveal answer
intermediate
How do SparkSession and SparkContext relate to each other?
SparkSession internally manages a SparkContext. When you create a SparkSession, it creates or uses an existing SparkContext. You usually use SparkSession now because it handles more features and is simpler.
Click to reveal answer
beginner
How to create a SparkSession in PySpark?
You create a SparkSession using the builder pattern like this:<br><pre>from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('MyApp').getOrCreate()</pre><br>This starts Spark and lets you use DataFrames and SQL.Click to reveal answer
intermediate
Why should you prefer SparkSession over SparkContext in modern Spark applications?
SparkSession provides a unified interface for all Spark features like SQL, streaming, and machine learning. It simplifies code and manages SparkContext internally, so you don't have to handle multiple contexts separately.
Click to reveal answer
What is the main role of SparkContext in Apache Spark?
✗ Incorrect
SparkContext connects your program to the Spark cluster and lets you create RDDs for distributed computing.
Which object should you use to work with DataFrames and SQL in Spark 2.0 and later?
✗ Incorrect
SparkSession is the unified entry point for DataFrames, SQL, and other Spark features since Spark 2.0.
How do you create a SparkSession in PySpark?
✗ Incorrect
You create a SparkSession using the builder pattern with appName and getOrCreate() methods.
What does SparkSession internally manage?
✗ Incorrect
SparkSession internally manages SparkContext to handle cluster connection and job execution.
Why is SparkSession preferred over SparkContext in modern Spark applications?
✗ Incorrect
SparkSession unifies access to SQL, streaming, machine learning, and RDDs, simplifying Spark programming.
Explain the difference between SparkContext and SparkSession and when to use each.
Think about how Spark evolved from SparkContext to SparkSession.
You got /4 concepts.
Describe how to create a SparkSession in PySpark and why it is important.
Focus on the code and the benefits of SparkSession.
You got /4 concepts.