How to analyze survey data python

Data-analysis-pythonHow-ToBeginner · 4 min read

How to Analyze Survey Data in Python: Simple Steps and Example

To analyze survey data in Python, use pandas to load and clean the data, then apply simple statistics like mean, median, and frequency counts with pandas and numpy. Visualization libraries like matplotlib or seaborn help to understand patterns visually.

📐

Syntax

Use pandas.read_csv() to load survey data from a CSV file. Use DataFrame.describe() for summary statistics. Use value_counts() to count responses. Use matplotlib.pyplot or seaborn for charts.

pd.read_csv('file.csv'): Load data
df.describe(): Get stats like mean, std, min, max
df['column'].value_counts(): Count unique answers
plt.bar(), sns.countplot(): Visualize data

python

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load survey data
survey_df = pd.read_csv('survey.csv')

# Summary statistics
summary = survey_df.describe()

# Count answers in a column
counts = survey_df['Question1'].value_counts()

# Plot counts
sns.countplot(x='Question1', data=survey_df)
plt.show()

💻

Example

This example shows how to load survey data, get basic statistics, count answers for a question, and plot the results.

python

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Sample survey data as dictionary
data = {
    'Age': [25, 30, 22, 40, 28],
    'Satisfaction': ['Good', 'Excellent', 'Good', 'Poor', 'Excellent'],
    'Recommend': ['Yes', 'Yes', 'No', 'No', 'Yes']
}

# Create DataFrame
survey_df = pd.DataFrame(data)

# Show summary statistics for numeric columns
print(survey_df.describe())

# Count how many answered each satisfaction level
print(survey_df['Satisfaction'].value_counts())

# Plot satisfaction counts
sns.countplot(x='Satisfaction', data=survey_df)
plt.title('Survey Satisfaction Counts')
plt.show()

Output

Age count 5.000000 mean 29.000000 std 6.782330 min 22.000000 25% 25.000000 50% 28.000000 75% 30.000000 max 40.000000 Excellent 2 Good 2 Poor 1 Name: Satisfaction, dtype: int64

⚠️

Common Pitfalls

Common mistakes include not cleaning data before analysis, such as missing values or inconsistent answers. Forgetting to convert data types (e.g., strings to numbers) can cause errors. Also, plotting without checking data can lead to confusing charts.

Always check for missing data with df.isnull().sum() and clean or fill missing values before analysis.

python

import pandas as pd

# Wrong: Not handling missing data
survey_df = pd.DataFrame({'Age': [25, None, 22], 'Satisfaction': ['Good', 'Excellent', None]})
print(survey_df.describe())  # May ignore missing data silently

# Right: Fill missing data before analysis
survey_df['Age'] = survey_df['Age'].fillna(survey_df['Age'].mean())
survey_df['Satisfaction'] = survey_df['Satisfaction'].fillna('Unknown')
print(survey_df.describe())

Output

count 2.000000 mean 23.500000 std 2.121320 min 22.000000 25% 22.750000 50% 23.500000 75% 24.250000 max 25.000000 dtype: float64 count 3 unique 3 top Good freq 1 dtype: object count 3.000000 mean 23.666667 std 1.527525 min 22.000000 25% 22.833333 50% 23.666667 75% 24.500000 max 25.000000 dtype: float64 count 3 unique 3 top Good freq 1 dtype: object

📊

Quick Reference

Load data: pd.read_csv('file.csv')
Summary stats: df.describe()
Count values: df['col'].value_counts()
Check missing: df.isnull().sum()
Fill missing: df.fillna(value)
Plot counts: sns.countplot(x='col', data=df)

✅

Key Takeaways

Use pandas to load and summarize survey data easily.

Check and handle missing or inconsistent data before analysis.

Use value_counts() to count survey responses quickly.

Visualize data with seaborn or matplotlib for better insights.

Always verify data types and clean data for accurate results.