0
0
Pandasdata~30 mins

filter() for group-level filtering in Pandas - Mini Project: Build & Apply

Choose your learning style9 modes available
Filter Groups in a DataFrame Using filter()
📖 Scenario: You work in a small bookstore. You have sales data for different book genres. You want to find which genres have total sales above a certain number.
🎯 Goal: Build a program that uses pandas to group sales by genre and then filter groups where total sales are more than 100.
📋 What You'll Learn
Create a pandas DataFrame with given sales data
Set a sales threshold variable
Use groupby and filter() to keep only genres with total sales above threshold
Print the filtered DataFrame
💡 Why This Matters
🌍 Real World
Filtering grouped data is common in sales analysis, marketing, and finance to focus on important categories.
💼 Career
Data analysts and scientists often use group-level filtering to prepare data for reports and decision-making.
Progress0 / 4 steps
1
Create the sales DataFrame
Import pandas as pd. Create a DataFrame called sales_data with columns 'Genre' and 'Sales' using this exact data: 'Fiction' with 50, 'Non-Fiction' with 120, 'Fiction' with 70, 'Sci-Fi' with 90, and 'Non-Fiction' with 30.
Pandas
Need a hint?

Use pd.DataFrame with a dictionary where keys are column names and values are lists of data.

2
Set the sales threshold
Create a variable called sales_threshold and set it to 100.
Pandas
Need a hint?

Just assign the number 100 to the variable sales_threshold.

3
Filter genres with total sales above threshold
Use groupby('Genre') on sales_data and then use filter() with a lambda function that keeps groups where the sum of 'Sales' is greater than sales_threshold. Save the result in a variable called filtered_sales.
Pandas
Need a hint?

Use groupby('Genre') then filter() with a lambda that sums 'Sales' and compares to sales_threshold.

4
Print the filtered sales data
Print the variable filtered_sales to see the genres with total sales above 100.
Pandas
Need a hint?

Use print(filtered_sales) to show the filtered DataFrame.