0
0
Apache Sparkdata~20 mins

Local mode vs cluster mode in Apache Spark - Practice Questions

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Spark Execution Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding Spark Execution Modes

Which statement correctly describes the difference between Spark's local mode and cluster mode?

ALocal mode is used only for production, while cluster mode is only for testing and development.
BLocal mode requires a cluster manager, but cluster mode runs without any cluster manager.
CLocal mode runs Spark on a single machine using one JVM, while cluster mode runs Spark across multiple machines with distributed resources.
DLocal mode supports distributed data processing, but cluster mode processes data only on a single node.
Attempts:
2 left
💡 Hint

Think about how many machines and JVMs are involved in each mode.

Predict Output
intermediate
2:00remaining
Spark Master URL in Local Mode

What will be the output of the following Spark configuration code snippet?

Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master('local[4]').appName('TestApp').getOrCreate()
print(spark.sparkContext.master)
Alocal[4]
Byarn
Cspark://master:7077
Dlocal[*]
Attempts:
2 left
💡 Hint

Check the master URL string passed to the builder.

data_output
advanced
2:00remaining
Data Partitioning in Local vs Cluster Mode

Consider this Spark code that creates an RDD and counts partitions:

rdd = spark.sparkContext.parallelize(range(10), 3)
print(rdd.getNumPartitions())

What will be the output when running in local mode and cluster mode respectively?

A1 in local mode, 3 in cluster mode
B3 in both local and cluster mode
C3 in local mode, 1 in cluster mode
D1 in both local and cluster mode
Attempts:
2 left
💡 Hint

The number of partitions is set explicitly in the parallelize call.

🔧 Debug
advanced
2:00remaining
Error in Cluster Mode Spark Job Submission

A user submits a Spark job with master URL 'local' but expects it to run on a cluster. The job fails with connection errors. What is the most likely cause?

AThe Spark version is incompatible with the cluster mode.
BThe cluster manager is down, causing connection errors.
CThe application name is missing, causing the job to fail.
DUsing 'local' master URL runs Spark only on the local machine, so it cannot connect to the cluster resources.
Attempts:
2 left
💡 Hint

Check what the 'local' master URL means for Spark execution.

🚀 Application
expert
3:00remaining
Choosing Execution Mode for a Large Dataset

You have a dataset of 5 TB stored on HDFS and want to run a Spark job to analyze it. Which execution mode should you choose and why?

ACluster mode, because it distributes the workload across multiple machines to handle large data efficiently.
BLocal mode, because it is simpler and faster for large datasets.
CLocal mode, because it uses multiple JVMs to speed up processing.
DCluster mode, because it runs Spark on a single machine with more memory.
Attempts:
2 left
💡 Hint

Think about the size of data and how Spark handles distributed processing.