0
0
PandasHow-ToBeginner · 3 min read

How to Use read_csv Parameters in pandas for Data Loading

Use pandas.read_csv() with parameters like filepath_or_buffer to specify the file path, sep for delimiter, header to set header row, and usecols to select columns. These parameters help control how CSV data is read into a DataFrame.
📐

Syntax

The basic syntax of pandas.read_csv() is:

  • filepath_or_buffer: Path or URL of the CSV file.
  • sep: Character that separates columns, default is comma (,).
  • header: Row number to use as column names, default is 'infer' (usually 0 if header exists).
  • usecols: List of columns to read from the file.
  • dtype: Data type for columns.
  • skiprows: Number of rows or list of rows to skip at the start.
  • nrows: Number of rows to read.

These parameters customize how the CSV file is loaded into a DataFrame.

python
pandas.read_csv(filepath_or_buffer, sep=',', header='infer', usecols=None, dtype=None, skiprows=None, nrows=None)
💻

Example

This example shows how to read a CSV file with a semicolon separator, skip the first row, and select only specific columns.

python
import pandas as pd
from io import StringIO

csv_data = '''Name;Age;City;Salary
John;28;New York;70000
Anna;22;Los Angeles;80000
Mike;32;Chicago;65000'''

# Use StringIO to simulate a file object
file_like = StringIO(csv_data)

df = pd.read_csv(file_like, sep=';', skiprows=1, usecols=['Name', 'City'])
print(df)
Output
Name City 0 John New York 1 Anna Los Angeles 2 Mike Chicago
⚠️

Common Pitfalls

Common mistakes when using read_csv include:

  • Not setting the correct sep when the delimiter is not a comma.
  • Forgetting that header=0 means the first row is used as column names, so skipping rows can misalign headers.
  • Using usecols with column names that don't exist causes errors.
  • Not handling missing values or incorrect data types.

Always check your CSV file format before setting parameters.

python
import pandas as pd
from io import StringIO

csv_data = 'A|B|C\n1|2|3\n4|5|6'
file_like = StringIO(csv_data)

# Wrong: default sep=',' but file uses '|'
try:
    df_wrong = pd.read_csv(file_like)
except Exception as e:
    print(f'Error: {e}')

# Right: specify sep='|'
file_like.seek(0)  # reset pointer

df_right = pd.read_csv(file_like, sep='|')
print(df_right)
Output
Error: Error tokenizing data. C error: Expected 1 fields in line 2, saw 3 A B C 0 1 2 3 1 4 5 6
📊

Quick Reference

ParameterDescriptionDefault
filepath_or_bufferFile path or object to readNone (required)
sepDelimiter character','
headerRow number for column names0
usecolsColumns to read (list or callable)None (all columns)
dtypeData type for columnsNone (infer)
skiprowsRows to skip at startNone
nrowsNumber of rows to readNone (all rows)

Key Takeaways

Specify the correct delimiter with the sep parameter to avoid parsing errors.
Use header to control which row is used as column names, especially when skipping rows.
Select only needed columns with usecols to save memory and speed up loading.
Always check your CSV file format before setting parameters to avoid common mistakes.
read_csv is flexible and powerful for loading CSV data into pandas DataFrames.