How to Analyze Survey Data in Python: Simple Steps and Example
To analyze survey data in Python, use
pandas to load and clean the data, then apply simple statistics like mean, median, and frequency counts with pandas and numpy. Visualization libraries like matplotlib or seaborn help to understand patterns visually.Syntax
Use pandas.read_csv() to load survey data from a CSV file. Use DataFrame.describe() for summary statistics. Use value_counts() to count responses. Use matplotlib.pyplot or seaborn for charts.
pd.read_csv('file.csv'): Load datadf.describe(): Get stats like mean, std, min, maxdf['column'].value_counts(): Count unique answersplt.bar(),sns.countplot(): Visualize data
python
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Load survey data survey_df = pd.read_csv('survey.csv') # Summary statistics summary = survey_df.describe() # Count answers in a column counts = survey_df['Question1'].value_counts() # Plot counts sns.countplot(x='Question1', data=survey_df) plt.show()
Example
This example shows how to load survey data, get basic statistics, count answers for a question, and plot the results.
python
import pandas as pd import matplotlib.pyplot as plt import seaborn as sns # Sample survey data as dictionary data = { 'Age': [25, 30, 22, 40, 28], 'Satisfaction': ['Good', 'Excellent', 'Good', 'Poor', 'Excellent'], 'Recommend': ['Yes', 'Yes', 'No', 'No', 'Yes'] } # Create DataFrame survey_df = pd.DataFrame(data) # Show summary statistics for numeric columns print(survey_df.describe()) # Count how many answered each satisfaction level print(survey_df['Satisfaction'].value_counts()) # Plot satisfaction counts sns.countplot(x='Satisfaction', data=survey_df) plt.title('Survey Satisfaction Counts') plt.show()
Output
Age
count 5.000000
mean 29.000000
std 6.782330
min 22.000000
25% 25.000000
50% 28.000000
75% 30.000000
max 40.000000
Excellent 2
Good 2
Poor 1
Name: Satisfaction, dtype: int64
Common Pitfalls
Common mistakes include not cleaning data before analysis, such as missing values or inconsistent answers. Forgetting to convert data types (e.g., strings to numbers) can cause errors. Also, plotting without checking data can lead to confusing charts.
Always check for missing data with df.isnull().sum() and clean or fill missing values before analysis.
python
import pandas as pd # Wrong: Not handling missing data survey_df = pd.DataFrame({'Age': [25, None, 22], 'Satisfaction': ['Good', 'Excellent', None]}) print(survey_df.describe()) # May ignore missing data silently # Right: Fill missing data before analysis survey_df['Age'] = survey_df['Age'].fillna(survey_df['Age'].mean()) survey_df['Satisfaction'] = survey_df['Satisfaction'].fillna('Unknown') print(survey_df.describe())
Output
count 2.000000
mean 23.500000
std 2.121320
min 22.000000
25% 22.750000
50% 23.500000
75% 24.250000
max 25.000000
dtype: float64
count 3
unique 3
top Good
freq 1
dtype: object
count 3.000000
mean 23.666667
std 1.527525
min 22.000000
25% 22.833333
50% 23.666667
75% 24.500000
max 25.000000
dtype: float64
count 3
unique 3
top Good
freq 1
dtype: object
Quick Reference
- Load data:
pd.read_csv('file.csv') - Summary stats:
df.describe() - Count values:
df['col'].value_counts() - Check missing:
df.isnull().sum() - Fill missing:
df.fillna(value) - Plot counts:
sns.countplot(x='col', data=df)
Key Takeaways
Use pandas to load and summarize survey data easily.
Check and handle missing or inconsistent data before analysis.
Use value_counts() to count survey responses quickly.
Visualize data with seaborn or matplotlib for better insights.
Always verify data types and clean data for accurate results.