Complete the code to create a SparkSession running in local mode.
from pyspark.sql import SparkSession spark = SparkSession.builder.master([1]).appName("LocalApp").getOrCreate()
Using "local" tells Spark to run locally on one machine.
Complete the code to read a CSV file using Spark in cluster mode.
df = spark.read.format("csv").option("header", "true").load([1])
In cluster mode, data is often read from distributed storage like HDFS, so the path starts with hdfs://.
Fix the error in the SparkSession builder to run in cluster mode.
spark = SparkSession.builder.master([1]).appName("ClusterApp").getOrCreate()
To run Spark on a YARN cluster, the master should be set to "yarn".
Fill both blanks to create a SparkContext for local mode with 4 threads.
from pyspark import SparkContext sc = SparkContext(master=[1], appName=[2])
Setting master to "local[4]" runs Spark locally with 4 threads. The app name can be any string, here "LocalApp".
Fill all three blanks to create a SparkSession for cluster mode using standalone cluster manager.
spark = SparkSession.builder.master([1]).appName([2]).config("spark.submit.deployMode", [3]).getOrCreate()
To connect to a standalone Spark cluster, master is set to the cluster URL like "spark://master:7077". The deploy mode is set to "cluster" to run on the cluster. The app name identifies the job.