How to Analyze Customer Data in Python: Simple Steps
To analyze customer data in Python, use
pandas to load and manipulate data, then apply functions like groupby and describe to summarize it. Visualize trends with matplotlib or seaborn to better understand customer behavior.Syntax
Use pandas.read_csv() to load data from a file. Use DataFrame.groupby() to group data by categories. Use DataFrame.describe() to get summary statistics. Use matplotlib.pyplot.plot() or seaborn.barplot() to create charts.
python
import pandas as pd import matplotlib.pyplot as plt # Load data customer_data = pd.read_csv('customers.csv') # Group data by a column grouped = customer_data.groupby('Category') # Summary statistics summary = customer_data.describe() # Plot example plt.plot(customer_data['Age']) plt.show()
Example
This example loads customer data, groups customers by gender, calculates average age, and plots the count of customers by gender.
python
import pandas as pd import matplotlib.pyplot as plt # Sample customer data data = { 'CustomerID': [1, 2, 3, 4, 5], 'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'], 'Age': [25, 30, 22, 35, 28], 'PurchaseAmount': [100, 150, 80, 200, 120] } # Create DataFrame customers = pd.DataFrame(data) # Group by Gender and calculate average age avg_age = customers.groupby('Gender')['Age'].mean() # Count customers by Gender gender_counts = customers['Gender'].value_counts() # Plot customer count by Gender gender_counts.plot(kind='bar', color=['blue', 'pink']) plt.title('Customer Count by Gender') plt.xlabel('Gender') plt.ylabel('Count') plt.show() print('Average Age by Gender:') print(avg_age)
Output
Average Age by Gender:
Gender
Female 26.666667
Male 30.000000
Name: Age, dtype: float64
Common Pitfalls
Common mistakes include not handling missing data, which can cause errors or wrong results. Forgetting to convert data types (like dates or numbers) can lead to incorrect analysis. Also, plotting without checking data can produce confusing charts.
Always check for missing values with DataFrame.isnull() and clean or fill them before analysis.
python
import pandas as pd # Wrong: Not handling missing data # This will cause errors or wrong results # data = pd.read_csv('customers.csv') # print(data['Age'].mean()) # May be wrong if Age has missing values # Right: Handle missing data # Fill missing Age with average age # data['Age'] = data['Age'].fillna(data['Age'].mean()) # print(data['Age'].mean())
Quick Reference
| Function | Purpose | Example |
|---|---|---|
| pandas.read_csv() | Load data from CSV file | df = pd.read_csv('file.csv') |
| DataFrame.groupby() | Group data by column | df.groupby('Category') |
| DataFrame.describe() | Get summary statistics | df.describe() |
| DataFrame.isnull() | Check for missing values | df.isnull().sum() |
| matplotlib.pyplot.plot() | Create line plots | plt.plot(df['Age']) |
| DataFrame.fillna() | Fill missing data | df['Age'].fillna(30) |
Key Takeaways
Use pandas to load and manipulate customer data easily.
Group and summarize data with pandas functions like groupby and describe.
Always check and handle missing data before analysis.
Visualize data with matplotlib or seaborn to find patterns.
Clean data types and values for accurate results.