Challenge - 5 Problems

🎖️

Cluster Sizing and Auto-scaling Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

1:30remaining

Understanding Cluster Sizing Impact

What is the primary effect of increasing the number of worker nodes in a Spark cluster on job execution?

AIt has no effect on execution time as Spark jobs run sequentially regardless of cluster size.

BIt increases the execution time because more nodes cause more network overhead.

CIt decreases the total execution time by allowing more tasks to run in parallel.

DIt causes Spark to use less memory per node, slowing down the job.

Attempts:

2 left

❓ Predict Output

intermediate

1:30remaining

Auto-scaling Behavior in Spark

Given the following Spark configuration snippet, what will happen when the workload decreases significantly?

spark.conf.set("spark.dynamicAllocation.enabled", "true")
spark.conf.set("spark.dynamicAllocation.minExecutors", "2")
spark.conf.set("spark.dynamicAllocation.maxExecutors", "10")
spark.conf.set("spark.dynamicAllocation.initialExecutors", "5")

AThe number of executors will scale up to 10 when workload decreases.

BThe number of executors will scale down but never go below 2.

CThe number of executors will remain fixed at 5 regardless of workload.

DThe number of executors will immediately drop to 0 when workload decreases.

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Analyzing Cluster Utilization Metrics

Given this Spark cluster utilization data collected over 5 minutes:

Minute: 1, CPU Usage: 85%, Memory Usage: 70%
Minute: 2, CPU Usage: 90%, Memory Usage: 75%
Minute: 3, CPU Usage: 95%, Memory Usage: 80%
Minute: 4, CPU Usage: 92%, Memory Usage: 78%
Minute: 5, CPU Usage: 88%, Memory Usage: 74%

What is the best interpretation of this data regarding cluster sizing?

AThe cluster is mostly idle and can reduce nodes to save costs.

BThe cluster memory is over-utilized but CPU is under-utilized.

CThe cluster has balanced load and no changes are needed.

DThe cluster is under heavy load and may benefit from adding more nodes.

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identifying Auto-scaling Misconfiguration

Review this Spark configuration snippet and identify the issue that prevents auto-scaling from working properly:

spark.conf.set("spark.dynamicAllocation.enabled", "true")
spark.conf.set("spark.dynamicAllocation.minExecutors", "5")
spark.conf.set("spark.dynamicAllocation.maxExecutors", "3")
spark.conf.set("spark.dynamicAllocation.initialExecutors", "4")

AminExecutors is set higher than maxExecutors, causing a configuration conflict.

BdynamicAllocation.enabled is set to true, which disables auto-scaling.

CinitialExecutors is less than minExecutors, causing Spark to ignore it.

DmaxExecutors is less than initialExecutors, which is allowed and causes no issues.

Attempts:

2 left

🚀 Application

expert

2:30remaining

Optimizing Cluster Size for Variable Workloads

You manage a Spark cluster with dynamic workloads that spike unpredictably. Which strategy best balances cost and performance using auto-scaling?

ASet a low minExecutors to save costs and a high maxExecutors to handle spikes, enabling fast scaling.

BSet minExecutors equal to maxExecutors to keep cluster size fixed and avoid scaling delays.

CDisable dynamic allocation and manually add nodes during spikes to control costs.

DSet minExecutors high and maxExecutors low to prevent scaling and maintain stability.

Attempts:

2 left