0
0
Apache Sparkdata~10 mins

Spot instances for cost savings in Apache Spark - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Spot instances for cost savings
Request Spot Instance
Instance Launched if Available
Run Spark Job
Spot Instance May be Interrupted
Job Interrupted
Handle Interruption: Retry or Save State
Cost Savings Achieved
This flow shows how spot instances are requested, run Spark jobs, may get interrupted, and how handling interruptions leads to cost savings.
Execution Sample
Apache Spark
spark.conf.set("spark.dynamicAllocation.enabled", "true")
spark.conf.set("spark.executor.instances", "2")
spark.conf.set("spark.executor.spot", "true")

rdd = spark.sparkContext.parallelize(range(10))
print(rdd.collect())
This code configures Spark to use spot instances and runs a simple job collecting numbers 0 to 9.
Execution Table
StepActionSpot Instance StatusJob StatusOutput
1Request spot instancesRequestedPendingNo output yet
2Spot instances launchedActiveStarting jobNo output yet
3Run Spark job tasksActiveRunningPartial results processed
4Spot instance interruption checkActiveRunningPartial results processed
5Spot instance interruptedInterruptedJob failedJob interrupted error
6Handle interruption: retry jobRequestedRetryingNo output yet
7Spot instances relaunchedActiveRunningPartial results processed
8Job completes successfullyActiveCompleted[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
9Calculate cost savingsN/AN/ACost reduced by using spot instances
💡 Job completes successfully or is interrupted and retried until completion, achieving cost savings.
Variable Tracker
VariableStartAfter Step 2After Step 5After Step 7Final
spot_instance_statusNoneActiveInterruptedActiveActive
job_statusPendingStarting jobJob failedRunningCompleted
outputNoneNoneJob interrupted errorPartial results[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Key Moments - 3 Insights
Why does the job fail at step 5 even though the spot instance was active before?
At step 5, the spot instance is interrupted by the cloud provider, causing the job to fail because the instance is no longer available to run tasks, as shown in the execution_table row 5.
How does Spark handle the interruption to still complete the job?
Spark retries the job by requesting new spot instances and rerunning tasks, as seen in steps 6 and 7 of the execution_table, allowing the job to eventually complete.
What is the main benefit of using spot instances despite interruptions?
The main benefit is cost savings, because spot instances are cheaper than regular instances, and handling interruptions with retries still results in lower overall cost, as summarized in step 9.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the job status at step 4?
APending
BRunning
CCompleted
DJob failed
💡 Hint
Refer to execution_table row 4 under 'Job Status' column.
At which step does the spot instance get interrupted?
AStep 7
BStep 3
CStep 5
DStep 9
💡 Hint
Check execution_table row 5 under 'Spot Instance Status'.
If the job never got interrupted, which step would be skipped?
AStep 6 (Handle interruption: retry job)
BStep 2 (Spot instances launched)
CStep 8 (Job completes successfully)
DStep 1 (Request spot instances)
💡 Hint
Look at execution_table rows 5 and 6 to see what happens after interruption.
Concept Snapshot
Spot instances are cheaper cloud servers that can be interrupted.
Spark can run jobs on spot instances to save costs.
Jobs may fail if spot instances are interrupted.
Spark retries jobs automatically to handle interruptions.
This approach reduces cost but requires handling possible job restarts.
Full Transcript
Spot instances are cloud servers offered at lower prices but can be taken away anytime. When running Spark jobs on spot instances, the job starts by requesting these instances. If the instances are available, the job runs. However, spot instances can be interrupted, causing the job to fail. Spark handles this by retrying the job on new spot instances until it completes. This retry mechanism allows cost savings while ensuring job completion. The execution table shows each step from requesting spot instances, running the job, handling interruptions, and finally completing the job with cost savings.