0
0
PandasHow-ToBeginner · 3 min read

How to Use usecols in read_csv in pandas for Selective Columns

Use the usecols parameter in pandas.read_csv() to select specific columns to load from a CSV file. You can pass a list of column names or column indices to usecols to read only those columns, which saves memory and speeds up loading.
📐

Syntax

The usecols parameter in pandas.read_csv() lets you specify which columns to load from a CSV file.

  • usecols=None: loads all columns (default).
  • usecols=[list]: list of column names or indices to load.
  • usecols=function: a function that returns True for columns to load.
python
pandas.read_csv(filepath_or_buffer, usecols=None, ...)

# Examples:
# usecols=['col1', 'col3']  # load columns named 'col1' and 'col3'
# usecols=[0, 2]            # load first and third columns by index
# usecols=lambda x: x.startswith('A')  # load columns starting with 'A'
💻

Example

This example shows how to load only specific columns from a CSV file using usecols. It reads only the 'Name' and 'Age' columns from the data.

python
import pandas as pd
from io import StringIO

csv_data = '''Name,Age,City,Salary
Alice,30,New York,70000
Bob,25,Los Angeles,50000
Charlie,35,Chicago,60000
'''

# Use StringIO to simulate a file object
file_like = StringIO(csv_data)

# Read only 'Name' and 'Age' columns
df = pd.read_csv(file_like, usecols=['Name', 'Age'])
print(df)
Output
Name Age 0 Alice 30 1 Bob 25 2 Charlie 35
⚠️

Common Pitfalls

Common mistakes when using usecols include:

  • Passing column names that do not exist in the CSV file causes an error.
  • Using column indices that are out of range will raise an error.
  • Mixing column names and indices in the same usecols list is not allowed.
  • For large files, specifying usecols improves performance by loading less data.
python
import pandas as pd
from io import StringIO

csv_data = 'A,B,C\n1,2,3\n4,5,6'
file_like = StringIO(csv_data)

# Wrong: column 'D' does not exist
try:
    pd.read_csv(file_like, usecols=['A', 'D'])
except ValueError as e:
    print(f'Error: {e}')

# Correct: use existing columns
file_like.seek(0)
df = pd.read_csv(file_like, usecols=['A', 'C'])
print(df)
Output
Error: Usecols do not match columns, columns expected but not found: ['D'] A C 0 1 3 1 4 6
📊

Quick Reference

ParameterDescriptionExample
usecols=NoneLoad all columns (default)pd.read_csv('file.csv')
usecols=[list of names]Load columns by nameusecols=['Name', 'Age']
usecols=[list of indices]Load columns by positionusecols=[0, 2]
usecols=functionLoad columns where function returns Trueusecols=lambda x: x.startswith('A')

Key Takeaways

Use the usecols parameter in read_csv to load only needed columns and save memory.
You can specify columns by names, indices, or a function that filters column names.
Passing invalid column names or indices causes errors, so check your CSV headers.
Using usecols can speed up reading large CSV files by skipping unwanted columns.