How to Use usecols in read_csv in pandas for Selective Columns
Use the
usecols parameter in pandas.read_csv() to select specific columns to load from a CSV file. You can pass a list of column names or column indices to usecols to read only those columns, which saves memory and speeds up loading.Syntax
The usecols parameter in pandas.read_csv() lets you specify which columns to load from a CSV file.
usecols=None: loads all columns (default).usecols=[list]: list of column names or indices to load.usecols=function: a function that returns True for columns to load.
python
pandas.read_csv(filepath_or_buffer, usecols=None, ...) # Examples: # usecols=['col1', 'col3'] # load columns named 'col1' and 'col3' # usecols=[0, 2] # load first and third columns by index # usecols=lambda x: x.startswith('A') # load columns starting with 'A'
Example
This example shows how to load only specific columns from a CSV file using usecols. It reads only the 'Name' and 'Age' columns from the data.
python
import pandas as pd from io import StringIO csv_data = '''Name,Age,City,Salary Alice,30,New York,70000 Bob,25,Los Angeles,50000 Charlie,35,Chicago,60000 ''' # Use StringIO to simulate a file object file_like = StringIO(csv_data) # Read only 'Name' and 'Age' columns df = pd.read_csv(file_like, usecols=['Name', 'Age']) print(df)
Output
Name Age
0 Alice 30
1 Bob 25
2 Charlie 35
Common Pitfalls
Common mistakes when using usecols include:
- Passing column names that do not exist in the CSV file causes an error.
- Using column indices that are out of range will raise an error.
- Mixing column names and indices in the same
usecolslist is not allowed. - For large files, specifying
usecolsimproves performance by loading less data.
python
import pandas as pd from io import StringIO csv_data = 'A,B,C\n1,2,3\n4,5,6' file_like = StringIO(csv_data) # Wrong: column 'D' does not exist try: pd.read_csv(file_like, usecols=['A', 'D']) except ValueError as e: print(f'Error: {e}') # Correct: use existing columns file_like.seek(0) df = pd.read_csv(file_like, usecols=['A', 'C']) print(df)
Output
Error: Usecols do not match columns, columns expected but not found: ['D']
A C
0 1 3
1 4 6
Quick Reference
| Parameter | Description | Example |
|---|---|---|
| usecols=None | Load all columns (default) | pd.read_csv('file.csv') |
| usecols=[list of names] | Load columns by name | usecols=['Name', 'Age'] |
| usecols=[list of indices] | Load columns by position | usecols=[0, 2] |
| usecols=function | Load columns where function returns True | usecols=lambda x: x.startswith('A') |
Key Takeaways
Use the usecols parameter in read_csv to load only needed columns and save memory.
You can specify columns by names, indices, or a function that filters column names.
Passing invalid column names or indices causes errors, so check your CSV headers.
Using usecols can speed up reading large CSV files by skipping unwanted columns.