Challenge - 5 Problems

🎖️

Spot Instance Savings Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of Spark job with spot instance interruption handling

Consider a Spark job running on spot instances that can be interrupted. What will be the output count of processed records if the job is configured to retry 2 times on failure and the spot instance is interrupted once during processing of 100 records?

Apache Spark

val data = spark.sparkContext.parallelize(1 to 100)
val processed = data.map(x => {
  if (x == 50) throw new RuntimeException("Spot instance interrupted")
  else x
}).count()

B50

C100

D150

Attempts:

2 left

🧠 Conceptual

intermediate

1:30remaining

Understanding cost savings with spot instances

Which of the following best explains why spot instances reduce costs in cloud-based Spark clusters?

ASpot instances are cheaper because they use idle cloud capacity that can be reclaimed anytime.

BSpot instances are cheaper because they use older hardware with lower performance.

CSpot instances are cheaper because they have fewer security features.

DSpot instances are cheaper because they run on dedicated physical servers.

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in Spark spot instance handling code

What error will this Spark code produce when running on spot instances with possible interruptions? val rdd = spark.sparkContext.parallelize(1 to 10) val result = rdd.map(x => if (x == 5) throw new Exception("Instance lost") else x).collect() println(result.mkString(","))

Apache Spark

val rdd = spark.sparkContext.parallelize(1 to 10)
val result = rdd.map(x => if (x == 5) throw new Exception("Instance lost") else x).collect()
println(result.mkString(","))

AException: Instance lost

BArray(1, 2, 3, 4, 6, 7, 8, 9, 10)

CNullPointerException

DJob completes with output 1 to 10

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Data output after handling spot instance interruptions with checkpointing

Given a Spark job that uses checkpointing to handle spot instance interruptions, what will be the output count after processing 200 records with an interruption at record 150?

Apache Spark

spark.sparkContext.setCheckpointDir("/tmp/checkpoint")
val data = spark.sparkContext.parallelize(1 to 200)
val processed = data.map(x => {
  if (x == 150) throw new RuntimeException("Spot instance interrupted")
  else x
}).checkpoint().count()

B150

C100

D200

Attempts:

2 left

🚀 Application

expert

2:30remaining

Optimizing Spark cluster cost with spot instances and autoscaling

You have a Spark cluster running on spot instances with autoscaling enabled. Which strategy will best minimize cost while maintaining job reliability?

AUse only on-demand instances with autoscaling for maximum reliability.

BUse a mix of spot and on-demand instances with autoscaling to handle interruptions gracefully.

CUse only spot instances without autoscaling to avoid interruptions.

DDisable autoscaling and use spot instances to save cost.

Attempts:

2 left