0
0
PandasHow-ToBeginner · 3 min read

How to Read CSV with Encoding in pandas | Simple Guide

Use pandas.read_csv() with the encoding parameter to specify the file's text encoding, like encoding='utf-8' or encoding='latin1'. This ensures pandas correctly reads characters from CSV files saved with different encodings.
📐

Syntax

The basic syntax to read a CSV file with encoding in pandas is:

  • filepath_or_buffer: The path to your CSV file.
  • encoding: The text encoding of the file, e.g., 'utf-8', 'latin1', 'cp1252'.

Setting the correct encoding helps pandas interpret the file's characters properly.

python
import pandas as pd

df = pd.read_csv('file.csv', encoding='utf-8')
💻

Example

This example shows how to read a CSV file saved with latin1 encoding. It prints the DataFrame to verify the data loads correctly.

python
import pandas as pd

# Sample CSV content saved with latin1 encoding:
# name;age;city
# José;28;São Paulo
# Ana;22;Lisboa

# Reading the CSV with correct encoding and separator

df = pd.read_csv('sample_latin1.csv', encoding='latin1', sep=';')
print(df)
Output
name age city 0 José 28 São Paulo 1 Ana 22 Lisboa
⚠️

Common Pitfalls

Common mistakes include:

  • Not specifying encoding when the file is not UTF-8, causing errors or wrong characters.
  • Using the wrong encoding name, which raises an error.
  • Ignoring the separator if it's not a comma, which can cause parsing issues.

Always check the file encoding and delimiter before reading.

python
import pandas as pd

# Wrong way: no encoding specified for a latin1 file
# This may cause errors or wrong characters
# df = pd.read_csv('sample_latin1.csv', sep=';')

# Right way: specify encoding

df = pd.read_csv('sample_latin1.csv', encoding='latin1', sep=';')
print(df)
Output
name age city 0 José 28 São Paulo 1 Ana 22 Lisboa
📊

Quick Reference

ParameterDescriptionExample Values
filepath_or_bufferPath to the CSV file'data.csv', 'folder/file.csv'
encodingText encoding of the file'utf-8', 'latin1', 'cp1252'
sepField delimiter',' (default), ';', '\t'
error_bad_linesSkip bad lines (deprecated, use on_bad_lines)False, True
on_bad_linesHow to handle bad lines'error', 'warn', 'skip'

Key Takeaways

Always specify the correct encoding in pandas.read_csv to avoid character errors.
Common encodings include 'utf-8' for most files and 'latin1' for some European files.
Check the CSV delimiter and specify it with the sep parameter if not a comma.
If you get decoding errors, try different encodings like 'latin1' or 'cp1252'.
Use pandas documentation or tools like Notepad++ to find the file encoding if unsure.