PandasHow-ToBeginner · 3 min read

How to Use usecols in read_csv in pandas for Selective Columns

Use the usecols parameter in pandas.read_csv() to select specific columns to load from a CSV file. You can pass a list of column names or column indices to usecols to read only those columns, which saves memory and speeds up loading.

📐

Syntax

The usecols parameter in pandas.read_csv() lets you specify which columns to load from a CSV file.

usecols=None: loads all columns (default).
usecols=[list]: list of column names or indices to load.
usecols=function: a function that returns True for columns to load.

python

pandas.read_csv(filepath_or_buffer, usecols=None, ...)

# Examples:
# usecols=['col1', 'col3']  # load columns named 'col1' and 'col3'
# usecols=[0, 2]            # load first and third columns by index
# usecols=lambda x: x.startswith('A')  # load columns starting with 'A'

💻

Example

This example shows how to load only specific columns from a CSV file using usecols. It reads only the 'Name' and 'Age' columns from the data.

python

import pandas as pd
from io import StringIO

csv_data = '''Name,Age,City,Salary
Alice,30,New York,70000
Bob,25,Los Angeles,50000
Charlie,35,Chicago,60000
'''

# Use StringIO to simulate a file object
file_like = StringIO(csv_data)

# Read only 'Name' and 'Age' columns
df = pd.read_csv(file_like, usecols=['Name', 'Age'])
print(df)

Output

Name Age 0 Alice 30 1 Bob 25 2 Charlie 35

⚠️

Common Pitfalls

Common mistakes when using usecols include:

Passing column names that do not exist in the CSV file causes an error.
Using column indices that are out of range will raise an error.
Mixing column names and indices in the same usecols list is not allowed.
For large files, specifying usecols improves performance by loading less data.

python

import pandas as pd
from io import StringIO

csv_data = 'A,B,C\n1,2,3\n4,5,6'
file_like = StringIO(csv_data)

# Wrong: column 'D' does not exist
try:
    pd.read_csv(file_like, usecols=['A', 'D'])
except ValueError as e:
    print(f'Error: {e}')

# Correct: use existing columns
file_like.seek(0)
df = pd.read_csv(file_like, usecols=['A', 'C'])
print(df)

Output

Error: Usecols do not match columns, columns expected but not found: ['D'] A C 0 1 3 1 4 6

📊

Quick Reference

Parameter	Description	Example
usecols=None	Load all columns (default)	pd.read_csv('file.csv')
usecols=[list of names]	Load columns by name	usecols=['Name', 'Age']
usecols=[list of indices]	Load columns by position	usecols=[0, 2]
usecols=function	Load columns where function returns True	usecols=lambda x: x.startswith('A')

✅

Key Takeaways

Use the usecols parameter in read_csv to load only needed columns and save memory.

You can specify columns by names, indices, or a function that filters column names.

Passing invalid column names or indices causes errors, so check your CSV headers.

Using usecols can speed up reading large CSV files by skipping unwanted columns.