GroupBy and aggregations
📖 Scenario: You work at a small online store. You have a list of sales records with product names and quantities sold. You want to find out how many units of each product were sold in total.
🎯 Goal: Build a Spark DataFrame from sales data, then group the data by product name and calculate the total quantity sold for each product.
📋 What You'll Learn
Create a Spark DataFrame with columns
product and quantity using the exact data provided.Create a variable called
grouped_data that groups the DataFrame by product.Use the
agg function with sum aggregation on the quantity column.Rename the aggregated column to
total_quantity.Print the resulting DataFrame using
show().💡 Why This Matters
🌍 Real World
Grouping and aggregating data is common in sales analysis, inventory management, and reporting to understand totals and summaries.
💼 Career
Data analysts and data scientists use groupBy and aggregation to summarize large datasets and extract meaningful insights for business decisions.
Progress0 / 4 steps