0
0
Apache Sparkdata~30 mins

Spot instances for cost savings in Apache Spark - Mini Project: Build & Apply

Choose your learning style9 modes available
Spot instances for cost savings
📖 Scenario: You work for a cloud services company that wants to analyze the cost savings from using spot instances instead of on-demand instances. Spot instances are cheaper but can be interrupted. You have data about instance types, their on-demand prices, and spot prices.Your task is to calculate the percentage cost savings for each instance type when using spot instances.
🎯 Goal: Build a Spark DataFrame with instance pricing data, add a configuration for minimum savings threshold, filter instance types that meet or exceed this threshold, and display the results.
📋 What You'll Learn
Create a Spark DataFrame with instance types and their on-demand and spot prices.
Add a configuration variable for minimum savings percentage.
Calculate the percentage savings for each instance type using spot instances.
Filter the DataFrame to only include instance types with savings greater than or equal to the threshold.
Display the filtered DataFrame.
💡 Why This Matters
🌍 Real World
Cloud engineers and data analysts use this kind of analysis to optimize cloud costs by choosing cheaper spot instances when possible.
💼 Career
Understanding how to manipulate Spark DataFrames and perform cost analysis is valuable for roles in cloud cost management, data engineering, and data science.
Progress0 / 4 steps
1
Create the initial DataFrame with instance pricing
Create a Spark DataFrame called instances_df with the following data: instance_type as ['m5.large', 'c5.large', 'r5.large'], on_demand_price as [0.096, 0.085, 0.126], and spot_price as [0.029, 0.025, 0.038].
Apache Spark
Need a hint?

Use spark.createDataFrame with a list of tuples and specify the column names.

2
Add a minimum savings threshold configuration
Create a variable called min_savings and set it to 60 to represent the minimum percentage savings required to consider an instance type.
Apache Spark
Need a hint?

Just create a variable min_savings and assign the value 60.

3
Calculate percentage savings and filter by threshold
Add a new column called percent_savings to instances_df that calculates the percentage savings as ((on_demand_price - spot_price) / on_demand_price) * 100. Then filter instances_df to only include rows where percent_savings is greater than or equal to min_savings. Save the filtered DataFrame as filtered_df.
Apache Spark
Need a hint?

Use withColumn to add percent_savings and filter with col to filter rows.

4
Display the filtered DataFrame
Use filtered_df.show() to display the instance types with percentage savings greater than or equal to min_savings.
Apache Spark
Need a hint?

Use filtered_df.show() to print the DataFrame.