Complete the code to create a cross-tabulation table between 'Gender' and 'Preference'.
import pandas as pd data = {'Gender': ['Male', 'Female', 'Female', 'Male'], 'Preference': ['A', 'B', 'A', 'B']} df = pd.DataFrame(data) result = pd.crosstab(df['Gender'], df[[1]]) print(result)
The crosstab() function is used to compute a simple cross-tabulation of two (or more) factors. Here, we want to cross-tabulate 'Gender' with 'Preference', so the second argument should be df['Preference'].
Complete the code to add margins (totals) to the cross-tabulation table.
import pandas as pd data = {'Team': ['A', 'B', 'A', 'B'], 'Result': ['Win', 'Lose', 'Lose', 'Win']} df = pd.DataFrame(data) ct = pd.crosstab(df['Team'], df['Result'], [1]=True) print(ct)
The margins=True parameter adds row and column totals to the cross-tabulation table.
Fix the error in the code to correctly compute the cross-tabulation of 'City' and 'Category'.
import pandas as pd data = {'City': ['NY', 'LA', 'NY', 'LA'], 'Category': ['Food', 'Food', 'Tech', 'Tech']} df = pd.DataFrame(data) ct = pd.crosstab(df.City, df.[1]) print(ct)
df.'Category'.When accessing DataFrame columns as attributes, use the exact column name without quotes. Here, df.Category is correct, not df.'Category'.
Fill both blanks to create a normalized cross-tabulation table showing row proportions.
import pandas as pd data = {'Group': ['X', 'Y', 'X', 'Y'], 'Outcome': ['Pass', 'Fail', 'Fail', 'Pass']} df = pd.DataFrame(data) ct = pd.crosstab(df[[1]], df[[2]], normalize='index') print(ct)
To normalize by rows, use normalize='index'. The first argument is the row variable ('Group'), the second is the column variable ('Outcome').
Fill all three blanks to create a cross-tabulation table with margins and specify the aggregation function to count unique 'UserID's.
import pandas as pd data = {'UserID': [1, 2, 1, 3, 2], 'Product': ['A', 'A', 'B', 'B', 'A'], 'Region': ['East', 'West', 'East', 'West', 'East']} df = pd.DataFrame(data) ct = pd.crosstab(df[[1]], df[[2]], values=df[[3]], aggfunc='nunique', margins=True) print(ct)
margins=True.The first two blanks are the row and column variables ('Region' and 'Product'). The third blank is the values to aggregate ('UserID'). Using aggfunc='nunique' counts unique users per group.