0
0
Apache Sparkdata~20 mins

Spot instances for cost savings in Apache Spark - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Spot Instance Savings Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of Spark job with spot instance interruption handling
Consider a Spark job running on spot instances that can be interrupted. What will be the output count of processed records if the job is configured to retry 2 times on failure and the spot instance is interrupted once during processing of 100 records?
Apache Spark
val data = spark.sparkContext.parallelize(1 to 100)
val processed = data.map(x => {
  if (x == 50) throw new RuntimeException("Spot instance interrupted")
  else x
}).count()
A0
B50
C100
D150
Attempts:
2 left
💡 Hint
Think about how retries affect the total processed count.
🧠 Conceptual
intermediate
1:30remaining
Understanding cost savings with spot instances
Which of the following best explains why spot instances reduce costs in cloud-based Spark clusters?
ASpot instances are cheaper because they use idle cloud capacity that can be reclaimed anytime.
BSpot instances are cheaper because they use older hardware with lower performance.
CSpot instances are cheaper because they have fewer security features.
DSpot instances are cheaper because they run on dedicated physical servers.
Attempts:
2 left
💡 Hint
Think about how cloud providers price unused resources.
🔧 Debug
advanced
2:00remaining
Identify the error in Spark spot instance handling code
What error will this Spark code produce when running on spot instances with possible interruptions? val rdd = spark.sparkContext.parallelize(1 to 10) val result = rdd.map(x => if (x == 5) throw new Exception("Instance lost") else x).collect() println(result.mkString(","))
Apache Spark
val rdd = spark.sparkContext.parallelize(1 to 10)
val result = rdd.map(x => if (x == 5) throw new Exception("Instance lost") else x).collect()
println(result.mkString(","))
AException: Instance lost
BArray(1, 2, 3, 4, 6, 7, 8, 9, 10)
CNullPointerException
DJob completes with output 1 to 10
Attempts:
2 left
💡 Hint
What happens when an exception is thrown inside a Spark transformation?
data_output
advanced
2:00remaining
Data output after handling spot instance interruptions with checkpointing
Given a Spark job that uses checkpointing to handle spot instance interruptions, what will be the output count after processing 200 records with an interruption at record 150?
Apache Spark
spark.sparkContext.setCheckpointDir("/tmp/checkpoint")
val data = spark.sparkContext.parallelize(1 to 200)
val processed = data.map(x => {
  if (x == 150) throw new RuntimeException("Spot instance interrupted")
  else x
}).checkpoint().count()
A0
B150
C100
D200
Attempts:
2 left
💡 Hint
Checkpointing helps recover from failures by saving progress.
🚀 Application
expert
2:30remaining
Optimizing Spark cluster cost with spot instances and autoscaling
You have a Spark cluster running on spot instances with autoscaling enabled. Which strategy will best minimize cost while maintaining job reliability?
AUse only on-demand instances with autoscaling for maximum reliability.
BUse a mix of spot and on-demand instances with autoscaling to handle interruptions gracefully.
CUse only spot instances without autoscaling to avoid interruptions.
DDisable autoscaling and use spot instances to save cost.
Attempts:
2 left
💡 Hint
Consider balancing cost savings and job reliability.