0
0
Pandasdata~30 mins

Cross-tabulation advanced usage in Pandas - Mini Project: Build & Apply

Choose your learning style9 modes available
Cross-tabulation Advanced Usage
📖 Scenario: You work as a data analyst for a retail company. You have sales data that includes the product category, the region where the product was sold, and whether the sale was made online or in-store. Your manager wants to understand the relationship between product categories, sales channels, and regions to improve marketing strategies.
🎯 Goal: Build a cross-tabulation table using pandas that shows the count of sales for each product category by sales channel and region. Then, add margins (totals) and normalize the data by row to see proportions.
📋 What You'll Learn
Create a pandas DataFrame with the exact sales data provided.
Create a variable for the normalization axis.
Use pandas crosstab to create a multi-index cross-tabulation of product category vs sales channel and region.
Add margins (totals) to the cross-tabulation.
Normalize the cross-tabulation by rows using the normalization axis variable.
Print the final normalized cross-tabulation table.
💡 Why This Matters
🌍 Real World
Cross-tabulation helps businesses analyze relationships between multiple categorical variables, such as product sales by region and channel, to make informed decisions.
💼 Career
Data analysts and data scientists use cross-tabulation to summarize and explore data patterns, which supports marketing, sales, and operational strategies.
Progress0 / 4 steps
1
Create the sales data DataFrame
Create a pandas DataFrame called sales_data with these exact columns and values:
Product: ['Shoes', 'Shoes', 'Shoes', 'Shirts', 'Shirts', 'Shirts', 'Hats', 'Hats', 'Hats', 'Shoes', 'Shirts', 'Hats']
Region: ['North', 'South', 'East', 'North', 'South', 'East', 'North', 'South', 'East', 'North', 'South', 'East']
Channel: ['Online', 'In-store', 'Online', 'In-store', 'Online', 'In-store', 'Online', 'In-store', 'Online', 'In-store', 'Online', 'In-store']
Pandas
Need a hint?

Use pd.DataFrame with a dictionary containing the exact lists for 'Product', 'Region', and 'Channel'.

2
Set normalization axis variable
Create a variable called norm_axis and set it to 1 to indicate normalization by rows.
Pandas
Need a hint?

Set norm_axis to 1 to normalize by rows later.

3
Create cross-tabulation with margins and normalization
Use pd.crosstab to create a cross-tabulation called cross_tab that counts sales by sales_data['Product'] as rows and a combination of sales_data['Channel'] and sales_data['Region'] as columns. Add margins (totals) with margins=True. Then normalize the table by rows using the variable norm_axis.
Pandas
Need a hint?

Use pd.crosstab with a list of columns for the columns argument to create multi-index columns. Use margins=True to add totals. Then normalize by dividing by the sum along norm_axis.

4
Print the normalized cross-tabulation
Print the variable cross_tab to display the normalized cross-tabulation table with margins.
Pandas
Need a hint?

Use print(cross_tab) to display the final normalized table.