Challenge - 5 Problems

🎖️

Cross-tabulation Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ data_output

intermediate

2:00remaining

Output of simple crosstab() with two categorical columns

Given the DataFrame below, what is the output of the crosstab() function?

Data Analysis Python

import pandas as pd

data = {'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
        'Preference': ['Coffee', 'Coffee', 'Tea', 'Coffee', 'Tea']}
df = pd.DataFrame(data)

result = pd.crosstab(df['Gender'], df['Preference'])
print(result)

Preference  Coffee  Tea
Gender                 
Female          1    2
Male            2    0

Preference  Coffee  Tea
Gender                 
Female          2    1
Male            1    2

Preference  Coffee  Tea
Gender                 
Female          1    3
Male            2    0

Preference  Coffee  Tea
Gender                 
Female          0    2
Male            2    1

Attempts:

2 left

🧠 Conceptual

intermediate

1:30remaining

Understanding margins parameter in crosstab()

What does setting the parameter margins=true do in the pd.crosstab() function?

ANormalizes the crosstab values to show proportions instead of counts.

BAdds a row and column with totals (sum) for each category and overall.

CFilters the crosstab to only show rows with more than 5 counts.

DSorts the crosstab rows and columns alphabetically.

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in crosstab() usage

What error will this code raise?

Data Analysis Python

import pandas as pd

ages = [23, 45, 31, 35]
genders = ['M', 'F', 'F', 'M']

# Valid: passing lists to crosstab
result = pd.crosstab(ages, genders, margins=True)
print(result)

ANo error, prints the crosstab table

BTypeError: unhashable type: 'list'

CTypeError: Cannot interpret 'ages' as a data frame column

DValueError: Index contains duplicate entries, cannot reshape

Attempts:

2 left

❓ visualization

advanced

2:30remaining

Visualizing crosstab output with a heatmap

Which code snippet correctly creates a heatmap visualization of the crosstab result using seaborn?

Data Analysis Python

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = {'City': ['NY', 'LA', 'NY', 'LA', 'NY'],
        'Transport': ['Car', 'Bus', 'Bus', 'Car', 'Car']}
df = pd.DataFrame(data)

ct = pd.crosstab(df['City'], df['Transport'])

plt.plot(ct)
plt.show()

sns.barplot(data=ct)
plt.show()

sns.heatmap(ct, annot=True)
plt.show()

sns.scatterplot(data=ct)
plt.show()

Attempts:

2 left

🚀 Application

expert

3:00remaining

Using crosstab() to analyze survey data with normalization

You have a DataFrame with columns 'AgeGroup' and 'Satisfaction' from a survey. You want to see the proportion of each satisfaction level within each age group (rows sum to 1). Which crosstab() call achieves this?

Data Analysis Python

import pandas as pd

data = {'AgeGroup': ['18-25', '18-25', '26-35', '26-35', '26-35', '36-45'],
        'Satisfaction': ['High', 'Low', 'Medium', 'High', 'Low', 'High']}
df = pd.DataFrame(data)

Apd.crosstab(df['AgeGroup'], df['Satisfaction'], margins=True, normalize='all')

Bpd.crosstab(df['AgeGroup'], df['Satisfaction'], normalize='columns')

Cpd.crosstab(df['AgeGroup'], df['Satisfaction'], normalize=True)

Dpd.crosstab(df['AgeGroup'], df['Satisfaction'], normalize='index')

Attempts:

2 left