Challenge - 5 Problems

🎖️

Cloud Spark Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Why does cloud storage simplify Spark data access?

Which of the following best explains why using cloud storage simplifies data access in Spark?

ACloud storage requires manual data replication to each Spark node for faster access.

BCloud storage automatically converts data formats to Spark's internal format.

CCloud storage provides a centralized, scalable location accessible by all Spark nodes without manual data copying.

DCloud storage limits data size to small chunks to improve Spark performance.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of Spark cluster resource allocation in cloud

What is the output of the following Spark code snippet when run on a cloud-managed cluster?

Apache Spark

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('Test').getOrCreate()
sc = spark.sparkContext
print(sc.getConf().get('spark.executor.instances'))

ANumber of executors automatically set by cloud manager (e.g., '4')

BRaises AttributeError

DNone

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Result of dynamic scaling in cloud Spark cluster

Given a Spark job running on a cloud cluster with dynamic scaling enabled, what is the expected change in the number of executors after a workload spike?

AThe number of executors decreases to save costs during the spike.

BThe job fails due to insufficient executors.

CThe number of executors remains fixed regardless of workload.

DThe number of executors increases automatically to handle the spike.

Attempts:

2 left

❓ visualization

advanced

2:00remaining

Visualizing Spark job stages on cloud vs local cluster

Which visualization best represents the difference in Spark job stage execution times between a cloud-managed cluster and a local cluster?

AA bar chart showing shorter and more consistent stage times on the cloud cluster.

BA line chart showing longer stage times on the cloud cluster due to network delays.

CA pie chart showing equal time distribution for all stages on both clusters.

DA scatter plot showing random stage times with no pattern.

Attempts:

2 left

🚀 Application

expert

3:00remaining

Choosing cloud features to simplify Spark job deployment

You want to simplify Spark job deployment and management. Which cloud feature should you prioritize to achieve this?

AUsing local on-premise storage with cloud compute.

BManaged Spark services with auto-scaling and integrated storage.

CManual cluster setup with fixed resource allocation.

DDisabling dynamic resource allocation to control costs.

Attempts:

2 left