How to Read CSV with Encoding in pandas | Simple Guide
Use
pandas.read_csv() with the encoding parameter to specify the file's text encoding, like encoding='utf-8' or encoding='latin1'. This ensures pandas correctly reads characters from CSV files saved with different encodings.Syntax
The basic syntax to read a CSV file with encoding in pandas is:
filepath_or_buffer: The path to your CSV file.encoding: The text encoding of the file, e.g., 'utf-8', 'latin1', 'cp1252'.
Setting the correct encoding helps pandas interpret the file's characters properly.
python
import pandas as pd df = pd.read_csv('file.csv', encoding='utf-8')
Example
This example shows how to read a CSV file saved with latin1 encoding. It prints the DataFrame to verify the data loads correctly.
python
import pandas as pd # Sample CSV content saved with latin1 encoding: # name;age;city # José;28;São Paulo # Ana;22;Lisboa # Reading the CSV with correct encoding and separator df = pd.read_csv('sample_latin1.csv', encoding='latin1', sep=';') print(df)
Output
name age city
0 José 28 São Paulo
1 Ana 22 Lisboa
Common Pitfalls
Common mistakes include:
- Not specifying encoding when the file is not UTF-8, causing errors or wrong characters.
- Using the wrong encoding name, which raises an error.
- Ignoring the separator if it's not a comma, which can cause parsing issues.
Always check the file encoding and delimiter before reading.
python
import pandas as pd # Wrong way: no encoding specified for a latin1 file # This may cause errors or wrong characters # df = pd.read_csv('sample_latin1.csv', sep=';') # Right way: specify encoding df = pd.read_csv('sample_latin1.csv', encoding='latin1', sep=';') print(df)
Output
name age city
0 José 28 São Paulo
1 Ana 22 Lisboa
Quick Reference
| Parameter | Description | Example Values |
|---|---|---|
| filepath_or_buffer | Path to the CSV file | 'data.csv', 'folder/file.csv' |
| encoding | Text encoding of the file | 'utf-8', 'latin1', 'cp1252' |
| sep | Field delimiter | ',' (default), ';', '\t' |
| error_bad_lines | Skip bad lines (deprecated, use on_bad_lines) | False, True |
| on_bad_lines | How to handle bad lines | 'error', 'warn', 'skip' |
Key Takeaways
Always specify the correct encoding in pandas.read_csv to avoid character errors.
Common encodings include 'utf-8' for most files and 'latin1' for some European files.
Check the CSV delimiter and specify it with the sep parameter if not a comma.
If you get decoding errors, try different encodings like 'latin1' or 'cp1252'.
Use pandas documentation or tools like Notepad++ to find the file encoding if unsure.