Challenge - 5 Problems

🎖️

Cross-tabulation Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of multi-index cross-tabulation with margins

What is the output of this code snippet using pandas crosstab with multi-index and margins?

Pandas

import pandas as pd

data = {'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
        'AgeGroup': ['Adult', 'Adult', 'Child', 'Child', 'Adult'],
        'Preference': ['B', 'A', 'A', 'B', 'A']}
df = pd.DataFrame(data)

result = pd.crosstab(index=[df['Gender'], df['AgeGroup']], columns=df['Preference'], margins=True)
print(result)

A{'A': {'Female': {'Adult': 2, 'Child': 1}, 'Male': {'Adult': 0, 'Child': 0}, 'All': 3}, 'B': {'Female': {'Adult': 0, 'Child': 0}, 'Male': {'Adult': 1, 'Child': 1}, 'All': 2}, 'All': {'Female': {'Adult': 2, 'Child': 1}, 'Male': {'Adult': 1, 'Child': 1}, 'All': 5}}

Preference       A  B  All
Gender AgeGroup           
Female Adult      2  0    2
       Child      1  0    1
Male   Adult      1  0    1
       Child      0  1    1
All              4  1    5

Preference       A  B  All
Gender AgeGroup           
Female Adult      2  0    2
       Child      1  0    1
Male   Adult      0  1    1
       Child      0  1    1
All              3  2    5

Preference       A  B  All
Gender AgeGroup           
Female Male       0  0    0
       Adult      2  0    2
       Child      1  0    1
Male   Adult      0  1    1
       Child      0  1    1
All              3  2    5

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Number of unique values in crosstab result

After running this code, how many unique values are in the resulting crosstab DataFrame?

Pandas

import pandas as pd

records = {'City': ['NY', 'LA', 'NY', 'LA', 'NY', 'LA'],
           'Product': ['X', 'X', 'Y', 'Y', 'X', 'Y'],
           'Sales': [10, 20, 10, 30, 20, 30]}
df = pd.DataFrame(records)

ct = pd.crosstab(df['City'], df['Product'], values=df['Sales'], aggfunc='sum', dropna=False)
unique_values = ct.nunique().sum()

Attempts:

2 left

❓ visualization

advanced

2:30remaining

Visualizing crosstab with normalization

Which option shows the correct heatmap visualization code for a normalized crosstab of 'Department' vs 'Satisfaction'?

Pandas

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = {'Department': ['HR', 'HR', 'IT', 'IT', 'Sales', 'Sales', 'Sales'],
        'Satisfaction': ['High', 'Low', 'High', 'Low', 'High', 'Low', 'Low']}
df = pd.DataFrame(data)

ct = pd.crosstab(df['Department'], df['Satisfaction'], normalize='index')

sns.heatmap(ct, annot=True, cmap='coolwarm')
plt.show()

sns.heatmap(ct.T, annot=True, cmap='viridis')
plt.show()

sns.heatmap(ct, annot=False, cmap='coolwarm')
plt.show()

sns.heatmap(ct, annot=True, cmap='Blues', cbar=False)
plt.show()

Attempts:

2 left

🔧 Debug

advanced

1:30remaining

Identify the error in crosstab with missing values

What error will this code raise when running the crosstab with missing values in the data?

Pandas

import pandas as pd

data = {'Team': ['A', 'B', 'A', None, 'B'],
        'Result': ['Win', 'Lose', None, 'Win', 'Lose']}
df = pd.DataFrame(data)

ct = pd.crosstab(df['Team'], df['Result'])
print(ct)

AKeyError

BNo error, prints crosstab with NaN rows/columns dropped

CTypeError

DValueError

Attempts:

2 left

🚀 Application

expert

2:30remaining

Calculate weighted crosstab with custom aggregation

Given this DataFrame, which option correctly computes a weighted crosstab of 'Category' vs 'Type' using the sum of 'Weight' as aggregation?

Pandas

import pandas as pd

data = {'Category': ['X', 'X', 'Y', 'Y', 'Z', 'Z'],
        'Type': ['A', 'B', 'A', 'B', 'A', 'B'],
        'Weight': [1.5, 2.0, 3.0, 1.0, 2.5, 0.5]}
df = pd.DataFrame(data)

Apd.crosstab(index=df['Category'], columns=df['Type'], values=df['Weight'], aggfunc='mean')

Bpd.crosstab(df['Category'], df['Type'], aggfunc='sum')

Cpd.crosstab(df['Category'], df['Type'], values='Weight', aggfunc='sum')

Dpd.crosstab(df['Category'], df['Type'], values=df['Weight'], aggfunc='sum')

Attempts:

2 left