0
0
Pandasdata~20 mins

crosstab() for cross-tabulation in Pandas - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Crosstab Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
data_output
intermediate
2:00remaining
Output of simple crosstab() usage

What is the output of the following code?

import pandas as pd

data = {'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
        'Preference': ['Coffee', 'Coffee', 'Tea', 'Coffee', 'Tea']}
df = pd.DataFrame(data)

result = pd.crosstab(df['Gender'], df['Preference'])
print(result)
Pandas
import pandas as pd

data = {'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
        'Preference': ['Coffee', 'Coffee', 'Tea', 'Coffee', 'Tea']}
df = pd.DataFrame(data)

result = pd.crosstab(df['Gender'], df['Preference'])
print(result)
A
Preference  Coffee  Tea
Gender                 
Female          1     2
Male            2     0
B
Preference  Coffee  Tea
Gender                 
Female          2     1
Male            1     1
C
Preference  Coffee  Tea
Gender                 
Female          1     2
Male            1     0
D
Preference  Coffee  Tea
Gender                 
Female          1     1
Male            2     1
Attempts:
2 left
💡 Hint

Count how many times each gender prefers each drink.

🧠 Conceptual
intermediate
1:30remaining
Understanding margins in crosstab()

What does the margins=True parameter add to the output of pd.crosstab()?

AIt filters the data to only include rows with missing values.
BIt adds a row and column showing the total counts for each category.
CIt normalizes the counts to show proportions instead of raw counts.
DIt sorts the rows and columns alphabetically.
Attempts:
2 left
💡 Hint

Think about what 'margins' means in a table context.

Predict Output
advanced
2:30remaining
Output with multiple aggregation functions

What is the output of this code?

import pandas as pd

data = {'Team': ['A', 'A', 'B', 'B', 'A'],
        'Result': ['Win', 'Lose', 'Win', 'Lose', 'Win'],
        'Points': [3, 0, 3, 0, 3]}
df = pd.DataFrame(data)

result = pd.crosstab(df['Team'], df['Result'], values=df['Points'], aggfunc='sum', margins=True)
print(result)
Pandas
import pandas as pd

data = {'Team': ['A', 'A', 'B', 'B', 'A'],
        'Result': ['Win', 'Lose', 'Win', 'Lose', 'Win'],
        'Points': [3, 0, 3, 0, 3]}
df = pd.DataFrame(data)

result = pd.crosstab(df['Team'], df['Result'], values=df['Points'], aggfunc='sum', margins=True)
print(result)
A
Result  Lose  Win  All
Team                   
A          0    3    3
B          0    3    3
All        0    6    6
B
Result  Lose  Win  All
Team                   
A          0    9    9
B          0    3    3
All        0   12   12
C
Result  Lose  Win  All
Team                   
A          0    6    9
B          0    3    3
All        0    9   12
D
Result  Lose  Win  All
Team                   
A          0    6    6
B          0    3    3
All        0    9    9
Attempts:
2 left
💡 Hint

Sum points for each team and result, then add totals.

🔧 Debug
advanced
2:00remaining
Identify the error in crosstab() usage

What error will this code produce?

import pandas as pd

data = {'Category': ['X', 'Y', 'X'], 'Value': [10, 20, 30]}
df = pd.DataFrame(data)

result = pd.crosstab(df['Category'], df['Value'], aggfunc='sum')
print(result)
Pandas
import pandas as pd

data = {'Category': ['X', 'Y', 'X'], 'Value': [10, 20, 30]}
df = pd.DataFrame(data)

result = pd.crosstab(df['Category'], df['Value'], aggfunc='sum')
print(result)
ATypeError: Cannot specify 'aggfunc' without 'values' parameter
BNo error, outputs a frequency table
CValueError: aggfunc must be a string, function or list of functions
DKeyError: 'Value' column not found
Attempts:
2 left
💡 Hint

Check if 'values' parameter is provided when using 'aggfunc'.

🚀 Application
expert
2:30remaining
Using crosstab() to analyze survey data

You have survey data with columns AgeGroup and FavoriteFruit. You want to find the percentage distribution of favorite fruits within each age group. Which code snippet produces this result?

Apd.crosstab(df['AgeGroup'], df['FavoriteFruit'], normalize=True)
Bpd.crosstab(df['AgeGroup'], df['FavoriteFruit'], normalize='columns')
Cpd.crosstab(df['AgeGroup'], df['FavoriteFruit'], normalize='index')
Dpd.crosstab(df['AgeGroup'], df['FavoriteFruit'], margins=True)
Attempts:
2 left
💡 Hint

Think about normalizing rows to get percentages within each age group.