0
0
Data Analysis Pythondata~30 mins

Chi-squared test in Data Analysis Python - Mini Project: Build & Apply

Choose your learning style9 modes available
Chi-squared Test for Independence
📖 Scenario: You work in a marketing team. You want to check if customer preference for two different products depends on their age group. You have collected data on how many customers in each age group prefer each product.
🎯 Goal: Build a Python program that uses a chi-squared test to check if product preference depends on age group.
📋 What You'll Learn
Create a contingency table as a dictionary of dictionaries with exact counts
Create a variable for the significance level
Use scipy.stats.chi2_contingency to perform the chi-squared test
Print the chi-squared statistic and p-value
💡 Why This Matters
🌍 Real World
Chi-squared tests are used in marketing, medicine, and social sciences to check if two categorical variables are related.
💼 Career
Data analysts and scientists use chi-squared tests to analyze survey data and customer preferences to make informed decisions.
Progress0 / 4 steps
1
Create the contingency table data
Create a dictionary called preference_data with keys as age groups: '18-25', '26-35', and '36-45'. Each key should map to another dictionary with keys 'Product A' and 'Product B' and these exact counts: '18-25': {'Product A': 30, 'Product B': 20}, '26-35': {'Product A': 25, 'Product B': 25}, '36-45': {'Product A': 20, 'Product B': 30}.
Data Analysis Python
Hint

Think of preference_data as a table with age groups as rows and product preferences as columns.

2
Set the significance level
Create a variable called alpha and set it to 0.05 to represent the significance level for the chi-squared test.
Data Analysis Python
Hint

The significance level alpha is usually set to 0.05 for many tests.

3
Perform the chi-squared test
Import chi2_contingency from scipy.stats. Convert preference_data into a 2D list called table with rows for each age group and columns for 'Product A' and 'Product B' in that order. Use chi2_contingency(table) to get the chi-squared statistic and p-value. Store them in variables chi2 and p.
Data Analysis Python
Hint

Remember to import chi2_contingency before using it. The table must be a list of lists with counts in the right order.

4
Print the chi-squared test results
Print the chi-squared statistic with the text Chi-squared statistic: followed by the value of chi2. Then print the p-value with the text p-value: followed by the value of p. Use print statements.
Data Analysis Python
Hint

Use print(f"Chi-squared statistic: {chi2}") and similarly for p-value.