How to Use read_csv Parameters in pandas for Data Loading
Use
pandas.read_csv() with parameters like filepath_or_buffer to specify the file path, sep for delimiter, header to set header row, and usecols to select columns. These parameters help control how CSV data is read into a DataFrame.Syntax
The basic syntax of pandas.read_csv() is:
filepath_or_buffer: Path or URL of the CSV file.sep: Character that separates columns, default is comma (,).header: Row number to use as column names, default is 'infer' (usually 0 if header exists).usecols: List of columns to read from the file.dtype: Data type for columns.skiprows: Number of rows or list of rows to skip at the start.nrows: Number of rows to read.
These parameters customize how the CSV file is loaded into a DataFrame.
python
pandas.read_csv(filepath_or_buffer, sep=',', header='infer', usecols=None, dtype=None, skiprows=None, nrows=None)
Example
This example shows how to read a CSV file with a semicolon separator, skip the first row, and select only specific columns.
python
import pandas as pd from io import StringIO csv_data = '''Name;Age;City;Salary John;28;New York;70000 Anna;22;Los Angeles;80000 Mike;32;Chicago;65000''' # Use StringIO to simulate a file object file_like = StringIO(csv_data) df = pd.read_csv(file_like, sep=';', skiprows=1, usecols=['Name', 'City']) print(df)
Output
Name City
0 John New York
1 Anna Los Angeles
2 Mike Chicago
Common Pitfalls
Common mistakes when using read_csv include:
- Not setting the correct
sepwhen the delimiter is not a comma. - Forgetting that
header=0means the first row is used as column names, so skipping rows can misalign headers. - Using
usecolswith column names that don't exist causes errors. - Not handling missing values or incorrect data types.
Always check your CSV file format before setting parameters.
python
import pandas as pd from io import StringIO csv_data = 'A|B|C\n1|2|3\n4|5|6' file_like = StringIO(csv_data) # Wrong: default sep=',' but file uses '|' try: df_wrong = pd.read_csv(file_like) except Exception as e: print(f'Error: {e}') # Right: specify sep='|' file_like.seek(0) # reset pointer df_right = pd.read_csv(file_like, sep='|') print(df_right)
Output
Error: Error tokenizing data. C error: Expected 1 fields in line 2, saw 3
A B C
0 1 2 3
1 4 5 6
Quick Reference
| Parameter | Description | Default |
|---|---|---|
| filepath_or_buffer | File path or object to read | None (required) |
| sep | Delimiter character | ',' |
| header | Row number for column names | 0 |
| usecols | Columns to read (list or callable) | None (all columns) |
| dtype | Data type for columns | None (infer) |
| skiprows | Rows to skip at start | None |
| nrows | Number of rows to read | None (all rows) |
Key Takeaways
Specify the correct delimiter with the sep parameter to avoid parsing errors.
Use header to control which row is used as column names, especially when skipping rows.
Select only needed columns with usecols to save memory and speed up loading.
Always check your CSV file format before setting parameters to avoid common mistakes.
read_csv is flexible and powerful for loading CSV data into pandas DataFrames.