Challenge - 5 Problems
Cross-tabulation Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of multi-index cross-tabulation with margins
What is the output of this code snippet using pandas crosstab with multi-index and margins?
Pandas
import pandas as pd data = {'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'], 'AgeGroup': ['Adult', 'Adult', 'Child', 'Child', 'Adult'], 'Preference': ['B', 'A', 'A', 'B', 'A']} df = pd.DataFrame(data) result = pd.crosstab(index=[df['Gender'], df['AgeGroup']], columns=df['Preference'], margins=True) print(result)
Attempts:
2 left
💡 Hint
Look carefully at the counts for each Gender and AgeGroup combination and the Preference columns.
✗ Incorrect
The crosstab groups by Gender and AgeGroup on rows and Preference on columns. Margins add totals. The counts match the data exactly: Female Adults prefer A twice, Female Child prefers A once, Male Adults prefer B once, Male Child prefers B once.
❓ data_output
intermediate1:30remaining
Number of unique values in crosstab result
After running this code, how many unique values are in the resulting crosstab DataFrame?
Pandas
import pandas as pd records = {'City': ['NY', 'LA', 'NY', 'LA', 'NY', 'LA'], 'Product': ['X', 'X', 'Y', 'Y', 'X', 'Y'], 'Sales': [10, 20, 10, 30, 20, 30]} df = pd.DataFrame(records) ct = pd.crosstab(df['City'], df['Product'], values=df['Sales'], aggfunc='sum', dropna=False) unique_values = ct.nunique().sum()
Attempts:
2 left
💡 Hint
Check the sums of sales for each City and Product combination.
✗ Incorrect
The crosstab sums sales by City and Product. The sums are: NY-X=30, NY-Y=10, LA-X=20, LA-Y=60. Unique values are 4.
❓ visualization
advanced2:30remaining
Visualizing crosstab with normalization
Which option shows the correct heatmap visualization code for a normalized crosstab of 'Department' vs 'Satisfaction'?
Pandas
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt data = {'Department': ['HR', 'HR', 'IT', 'IT', 'Sales', 'Sales', 'Sales'], 'Satisfaction': ['High', 'Low', 'High', 'Low', 'High', 'Low', 'Low']} df = pd.DataFrame(data) ct = pd.crosstab(df['Department'], df['Satisfaction'], normalize='index')
Attempts:
2 left
💡 Hint
Normalization is by index, so rows sum to 1. Annotated heatmap helps see values.
✗ Incorrect
Option A correctly plots the normalized crosstab with annotations and a suitable color map. Option A transposes, which changes axes meaning. Option A lacks annotations. Option A removes color bar, which is less informative.
🔧 Debug
advanced1:30remaining
Identify the error in crosstab with missing values
What error will this code raise when running the crosstab with missing values in the data?
Pandas
import pandas as pd data = {'Team': ['A', 'B', 'A', None, 'B'], 'Result': ['Win', 'Lose', None, 'Win', 'Lose']} df = pd.DataFrame(data) ct = pd.crosstab(df['Team'], df['Result']) print(ct)
Attempts:
2 left
💡 Hint
Check how pandas crosstab handles missing values by default.
✗ Incorrect
Pandas crosstab automatically excludes missing values in the grouping columns, so no error occurs. The output excludes rows or columns with NaN.
🚀 Application
expert2:30remaining
Calculate weighted crosstab with custom aggregation
Given this DataFrame, which option correctly computes a weighted crosstab of 'Category' vs 'Type' using the sum of 'Weight' as aggregation?
Pandas
import pandas as pd data = {'Category': ['X', 'X', 'Y', 'Y', 'Z', 'Z'], 'Type': ['A', 'B', 'A', 'B', 'A', 'B'], 'Weight': [1.5, 2.0, 3.0, 1.0, 2.5, 0.5]} df = pd.DataFrame(data)
Attempts:
2 left
💡 Hint
Check the correct parameter names for values and aggregation function in pandas crosstab.
✗ Incorrect
Option D uses correct parameter names: values=df['Weight'] and aggfunc='sum'. Option D misses values parameter. Option D uses values='Weight' (string) which is invalid. Option D uses aggfunc='mean' instead of sum.