0
0
Data-analysis-pythonHow-ToBeginner ยท 3 min read

How to Start Data Analysis with Python: Simple Steps for Beginners

To start data analysis with Python, first install and import key libraries like pandas for data handling and matplotlib for visualization. Then load your data into a DataFrame, explore it with simple commands, and create charts to understand patterns.
๐Ÿ“

Syntax

Here is the basic syntax to start data analysis in Python:

  • import: to bring in libraries like pandas and matplotlib.
  • pd.read_csv(): to load data from a CSV file into a DataFrame.
  • df.head(): to see the first few rows of data.
  • df.describe(): to get summary statistics.
  • plt.plot(): to create simple plots.
python
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('data.csv')  # Load data
print(df.head())             # Show first 5 rows
print(df.describe())         # Summary stats

plt.plot(df['column_name'])  # Plot data
plt.show()
๐Ÿ’ป

Example

This example loads a small dataset, shows its first rows, prints summary statistics, and plots one column to visualize trends.

python
import pandas as pd
import matplotlib.pyplot as plt

# Sample data as dictionary
data = {'Year': [2018, 2019, 2020, 2021, 2022],
        'Sales': [250, 270, 300, 320, 360]}

# Convert dictionary to DataFrame
df = pd.DataFrame(data)

# Show first rows
print(df.head())

# Summary statistics
print(df.describe())

# Plot Sales over Years
plt.plot(df['Year'], df['Sales'], marker='o')
plt.title('Sales Over Years')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.grid(True)
plt.show()
Output
Year Sales 0 2018 250 1 2019 270 2 2020 300 3 2021 320 4 2022 360 Year Sales count 5.000000 5.000000 mean 2019.600000 300.000000 std 1.581139 44.721360 min 2018.000000 250.000000 25% 2018.000000 270.000000 50% 2020.000000 300.000000 75% 2021.000000 320.000000 max 2022.000000 360.000000
โš ๏ธ

Common Pitfalls

Beginners often face these issues:

  • Not installing required libraries before importing them.
  • Using wrong file paths when loading data.
  • Confusing DataFrame methods like head() and describe().
  • Forgetting to call plt.show() to display plots.
python
import pandas as pd
import matplotlib.pyplot as plt

# Wrong: forgetting to install pandas or matplotlib causes errors
# Wrong: file path typo
# df = pd.read_csv('wrong_path.csv')  # FileNotFoundError

# Right way:
# Make sure to install with: pip install pandas matplotlib
# Use correct file path
# df = pd.read_csv('data.csv')

# Forgetting plt.show() means plot won't display
# plt.plot([1, 2, 3], [4, 5, 6])
# plt.show()  # Always call this to see the plot
๐Ÿ“Š

Quick Reference

Here is a quick cheat sheet for starting data analysis with Python:

CommandPurpose
import pandas as pdImport pandas library for data handling
import matplotlib.pyplot as pltImport matplotlib for plotting
pd.read_csv('file.csv')Load CSV data into DataFrame
df.head()View first 5 rows of data
df.describe()Get summary statistics
plt.plot(x, y)Create a line plot
plt.show()Display the plot
โœ…

Key Takeaways

Install and import pandas and matplotlib to start data analysis in Python.
Load data into a DataFrame using pd.read_csv() for easy manipulation.
Use df.head() and df.describe() to explore your data quickly.
Visualize data with matplotlib by plotting and calling plt.show().
Check file paths and library installations to avoid common errors.