How to Start Data Analysis with Python: Simple Steps for Beginners
To start data analysis with
Python, first install and import key libraries like pandas for data handling and matplotlib for visualization. Then load your data into a DataFrame, explore it with simple commands, and create charts to understand patterns.Syntax
Here is the basic syntax to start data analysis in Python:
import: to bring in libraries like pandas and matplotlib.pd.read_csv(): to load data from a CSV file into a DataFrame.df.head(): to see the first few rows of data.df.describe(): to get summary statistics.plt.plot(): to create simple plots.
python
import pandas as pd import matplotlib.pyplot as plt df = pd.read_csv('data.csv') # Load data print(df.head()) # Show first 5 rows print(df.describe()) # Summary stats plt.plot(df['column_name']) # Plot data plt.show()
Example
This example loads a small dataset, shows its first rows, prints summary statistics, and plots one column to visualize trends.
python
import pandas as pd import matplotlib.pyplot as plt # Sample data as dictionary data = {'Year': [2018, 2019, 2020, 2021, 2022], 'Sales': [250, 270, 300, 320, 360]} # Convert dictionary to DataFrame df = pd.DataFrame(data) # Show first rows print(df.head()) # Summary statistics print(df.describe()) # Plot Sales over Years plt.plot(df['Year'], df['Sales'], marker='o') plt.title('Sales Over Years') plt.xlabel('Year') plt.ylabel('Sales') plt.grid(True) plt.show()
Output
Year Sales
0 2018 250
1 2019 270
2 2020 300
3 2021 320
4 2022 360
Year Sales
count 5.000000 5.000000
mean 2019.600000 300.000000
std 1.581139 44.721360
min 2018.000000 250.000000
25% 2018.000000 270.000000
50% 2020.000000 300.000000
75% 2021.000000 320.000000
max 2022.000000 360.000000
Common Pitfalls
Beginners often face these issues:
- Not installing required libraries before importing them.
- Using wrong file paths when loading data.
- Confusing DataFrame methods like
head()anddescribe(). - Forgetting to call
plt.show()to display plots.
python
import pandas as pd import matplotlib.pyplot as plt # Wrong: forgetting to install pandas or matplotlib causes errors # Wrong: file path typo # df = pd.read_csv('wrong_path.csv') # FileNotFoundError # Right way: # Make sure to install with: pip install pandas matplotlib # Use correct file path # df = pd.read_csv('data.csv') # Forgetting plt.show() means plot won't display # plt.plot([1, 2, 3], [4, 5, 6]) # plt.show() # Always call this to see the plot
Quick Reference
Here is a quick cheat sheet for starting data analysis with Python:
| Command | Purpose |
|---|---|
| import pandas as pd | Import pandas library for data handling |
| import matplotlib.pyplot as plt | Import matplotlib for plotting |
| pd.read_csv('file.csv') | Load CSV data into DataFrame |
| df.head() | View first 5 rows of data |
| df.describe() | Get summary statistics |
| plt.plot(x, y) | Create a line plot |
| plt.show() | Display the plot |
Key Takeaways
Install and import pandas and matplotlib to start data analysis in Python.
Load data into a DataFrame using pd.read_csv() for easy manipulation.
Use df.head() and df.describe() to explore your data quickly.
Visualize data with matplotlib by plotting and calling plt.show().
Check file paths and library installations to avoid common errors.