Challenge - 5 Problems

🎖️

Spark Execution Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Understanding Spark Execution Modes

Which statement correctly describes the difference between Spark's local mode and cluster mode?

ALocal mode is used only for production, while cluster mode is only for testing and development.

BLocal mode requires a cluster manager, but cluster mode runs without any cluster manager.

CLocal mode runs Spark on a single machine using one JVM, while cluster mode runs Spark across multiple machines with distributed resources.

DLocal mode supports distributed data processing, but cluster mode processes data only on a single node.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Spark Master URL in Local Mode

What will be the output of the following Spark configuration code snippet?

Apache Spark

from pyspark.sql import SparkSession
spark = SparkSession.builder.master('local[4]').appName('TestApp').getOrCreate()
print(spark.sparkContext.master)

Alocal[4]

Byarn

Cspark://master:7077

Dlocal[*]

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Data Partitioning in Local vs Cluster Mode

Consider this Spark code that creates an RDD and counts partitions:

rdd = spark.sparkContext.parallelize(range(10), 3)
print(rdd.getNumPartitions())

What will be the output when running in local mode and cluster mode respectively?

A1 in local mode, 3 in cluster mode

B3 in both local and cluster mode

C3 in local mode, 1 in cluster mode

D1 in both local and cluster mode

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Error in Cluster Mode Spark Job Submission

A user submits a Spark job with master URL 'local' but expects it to run on a cluster. The job fails with connection errors. What is the most likely cause?

AThe Spark version is incompatible with the cluster mode.

BThe cluster manager is down, causing connection errors.

CThe application name is missing, causing the job to fail.

DUsing 'local' master URL runs Spark only on the local machine, so it cannot connect to the cluster resources.

Attempts:

2 left

🚀 Application

expert

3:00remaining

Choosing Execution Mode for a Large Dataset

You have a dataset of 5 TB stored on HDFS and want to run a Spark job to analyze it. Which execution mode should you choose and why?

ACluster mode, because it distributes the workload across multiple machines to handle large data efficiently.

BLocal mode, because it is simpler and faster for large datasets.

CLocal mode, because it uses multiple JVMs to speed up processing.

DCluster mode, because it runs Spark on a single machine with more memory.

Attempts:

2 left