Challenge - 5 Problems

🎖️

Crosstab Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ data_output

intermediate

2:00remaining

Output of simple crosstab() usage

What is the output of the following code?

import pandas as pd

data = {'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
        'Preference': ['Coffee', 'Coffee', 'Tea', 'Coffee', 'Tea']}
df = pd.DataFrame(data)

result = pd.crosstab(df['Gender'], df['Preference'])
print(result)

Pandas

import pandas as pd

data = {'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
        'Preference': ['Coffee', 'Coffee', 'Tea', 'Coffee', 'Tea']}
df = pd.DataFrame(data)

result = pd.crosstab(df['Gender'], df['Preference'])
print(result)

Preference  Coffee  Tea
Gender                 
Female          1     2
Male            2     0

Preference  Coffee  Tea
Gender                 
Female          2     1
Male            1     1

Preference  Coffee  Tea
Gender                 
Female          1     2
Male            1     0

Preference  Coffee  Tea
Gender                 
Female          1     1
Male            2     1

Attempts:

2 left

🧠 Conceptual

intermediate

1:30remaining

Understanding margins in crosstab()

What does the margins=True parameter add to the output of pd.crosstab()?

AIt filters the data to only include rows with missing values.

BIt adds a row and column showing the total counts for each category.

CIt normalizes the counts to show proportions instead of raw counts.

DIt sorts the rows and columns alphabetically.

Attempts:

2 left

❓ Predict Output

advanced

2:30remaining

Output with multiple aggregation functions

What is the output of this code?

import pandas as pd

data = {'Team': ['A', 'A', 'B', 'B', 'A'],
        'Result': ['Win', 'Lose', 'Win', 'Lose', 'Win'],
        'Points': [3, 0, 3, 0, 3]}
df = pd.DataFrame(data)

result = pd.crosstab(df['Team'], df['Result'], values=df['Points'], aggfunc='sum', margins=True)
print(result)

Pandas

import pandas as pd

data = {'Team': ['A', 'A', 'B', 'B', 'A'],
        'Result': ['Win', 'Lose', 'Win', 'Lose', 'Win'],
        'Points': [3, 0, 3, 0, 3]}
df = pd.DataFrame(data)

result = pd.crosstab(df['Team'], df['Result'], values=df['Points'], aggfunc='sum', margins=True)
print(result)

Result  Lose  Win  All
Team                   
A          0    3    3
B          0    3    3
All        0    6    6

Result  Lose  Win  All
Team                   
A          0    9    9
B          0    3    3
All        0   12   12

Result  Lose  Win  All
Team                   
A          0    6    9
B          0    3    3
All        0    9   12

Result  Lose  Win  All
Team                   
A          0    6    6
B          0    3    3
All        0    9    9

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in crosstab() usage

What error will this code produce?

import pandas as pd

data = {'Category': ['X', 'Y', 'X'], 'Value': [10, 20, 30]}
df = pd.DataFrame(data)

result = pd.crosstab(df['Category'], df['Value'], aggfunc='sum')
print(result)

Pandas

import pandas as pd

data = {'Category': ['X', 'Y', 'X'], 'Value': [10, 20, 30]}
df = pd.DataFrame(data)

result = pd.crosstab(df['Category'], df['Value'], aggfunc='sum')
print(result)

ATypeError: Cannot specify 'aggfunc' without 'values' parameter

BNo error, outputs a frequency table

CValueError: aggfunc must be a string, function or list of functions

DKeyError: 'Value' column not found

Attempts:

2 left

🚀 Application

expert

2:30remaining

Using crosstab() to analyze survey data

You have survey data with columns AgeGroup and FavoriteFruit. You want to find the percentage distribution of favorite fruits within each age group. Which code snippet produces this result?

Apd.crosstab(df['AgeGroup'], df['FavoriteFruit'], normalize=True)

Bpd.crosstab(df['AgeGroup'], df['FavoriteFruit'], normalize='columns')

Cpd.crosstab(df['AgeGroup'], df['FavoriteFruit'], normalize='index')

Dpd.crosstab(df['AgeGroup'], df['FavoriteFruit'], margins=True)

Attempts:

2 left