Challenge - 5 Problems
Cross-tabulation Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ data_output
intermediate2:00remaining
Output of simple crosstab() with two categorical columns
Given the DataFrame below, what is the output of the crosstab() function?
Data Analysis Python
import pandas as pd data = {'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'], 'Preference': ['Coffee', 'Coffee', 'Tea', 'Coffee', 'Tea']} df = pd.DataFrame(data) result = pd.crosstab(df['Gender'], df['Preference']) print(result)
Attempts:
2 left
💡 Hint
Count how many times each Gender prefers Coffee or Tea.
✗ Incorrect
The crosstab counts occurrences of each Gender-Preference pair. Female prefers Coffee once and Tea twice. Male prefers Coffee twice and Tea zero times.
🧠 Conceptual
intermediate1:30remaining
Understanding margins parameter in crosstab()
What does setting the parameter
margins=true do in the pd.crosstab() function?Attempts:
2 left
💡 Hint
Think about what 'margins' means in tables.
✗ Incorrect
The margins=true option adds an extra row and column showing the total counts for each row and column, plus the grand total.
🔧 Debug
advanced2:00remaining
Identify the error in crosstab() usage
What error will this code raise?
Data Analysis Python
import pandas as pd ages = [23, 45, 31, 35] genders = ['M', 'F', 'F', 'M'] # Valid: passing lists to crosstab result = pd.crosstab(ages, genders, margins=True) print(result)
Attempts:
2 left
💡 Hint
pd.crosstab accepts lists as array-like inputs.
✗ Incorrect
Passing lists directly to pd.crosstab() is valid as they are array-like. margins=True adds the margins successfully. No error is raised.
❓ visualization
advanced2:30remaining
Visualizing crosstab output with a heatmap
Which code snippet correctly creates a heatmap visualization of the crosstab result using seaborn?
Data Analysis Python
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt data = {'City': ['NY', 'LA', 'NY', 'LA', 'NY'], 'Transport': ['Car', 'Bus', 'Bus', 'Car', 'Car']} df = pd.DataFrame(data) ct = pd.crosstab(df['City'], df['Transport'])
Attempts:
2 left
💡 Hint
Heatmaps are good for showing tables with numbers.
✗ Incorrect
sns.heatmap() is designed to visualize matrix-like data such as crosstab outputs. annot=True shows the numbers on the heatmap.
🚀 Application
expert3:00remaining
Using crosstab() to analyze survey data with normalization
You have a DataFrame with columns 'AgeGroup' and 'Satisfaction' from a survey. You want to see the proportion of each satisfaction level within each age group (rows sum to 1). Which crosstab() call achieves this?
Data Analysis Python
import pandas as pd data = {'AgeGroup': ['18-25', '18-25', '26-35', '26-35', '26-35', '36-45'], 'Satisfaction': ['High', 'Low', 'Medium', 'High', 'Low', 'High']} df = pd.DataFrame(data)
Attempts:
2 left
💡 Hint
Normalization by 'index' means row-wise proportions.
✗ Incorrect
normalize='index' divides each row by its total, showing proportions within each age group.