0
0
PandasHow-ToBeginner · 3 min read

How to Set dtype in read_csv in pandas: Simple Guide

Use the dtype parameter in pandas.read_csv() to specify the data type for one or more columns. Pass a dictionary where keys are column names and values are the desired data types, like {'col1': 'int', 'col2': 'float'}. This ensures pandas reads the columns with the correct types directly.
📐

Syntax

The dtype parameter in pandas.read_csv() lets you set the data type for columns when loading a CSV file.

  • dtype: A dictionary mapping column names to data types (like 'int', 'float', 'str').
  • This helps pandas read columns with the correct type instead of guessing.
python
pandas.read_csv(filepath_or_buffer, dtype={'column_name': 'data_type', ...})
💻

Example

This example shows how to read a CSV file and set the data types for specific columns using the dtype parameter.

python
import pandas as pd
from io import StringIO

# Sample CSV data
csv_data = '''
name,age,salary
Alice,30,70000
Bob,25,50000
Charlie,35,80000
'''

# Use StringIO to simulate a file
csv_file = StringIO(csv_data)

# Read CSV with dtype set for 'age' and 'salary'
df = pd.read_csv(csv_file, dtype={'age': 'int32', 'salary': 'float64'})

print(df)
print(df.dtypes)
Output
name age salary 0 Alice 30 70000.0 1 Bob 25 50000.0 2 Charlie 35 80000.0 name object age int32 salary float64 dtype: object
⚠️

Common Pitfalls

Common mistakes when setting dtype in read_csv include:

  • Using a data type that does not match the actual data, causing errors.
  • Passing a single data type instead of a dictionary when multiple columns need types.
  • Not specifying dtype for columns with mixed types, leading pandas to guess incorrectly.

Always check your data and use a dictionary to map column names to types.

python
import pandas as pd
from io import StringIO

csv_data = '''
name,age,salary
Alice,30,70000
Bob,twentyfive,50000
'''
csv_file = StringIO(csv_data)

# Wrong: 'age' column has a string 'twentyfive' but dtype is int
try:
    df_wrong = pd.read_csv(csv_file, dtype={'age': 'int'})
except ValueError as e:
    print(f"Error: {e}")

# Correct: Read without dtype or fix data before setting dtype
csv_file.seek(0)
df_correct = pd.read_csv(csv_file)
print(df_correct)
Output
Error: invalid literal for int() with base 10: 'twentyfive' name age salary 0 Alice 30 70000 1 Bob twentyfive 50000
📊

Quick Reference

Summary tips for using dtype in read_csv:

  • Use a dictionary to assign types per column.
  • Common types: 'int', 'float', 'str', 'category'.
  • Helps reduce memory usage and parsing errors.
  • Check your data for invalid values before forcing types.

Key Takeaways

Use the dtype parameter with a dictionary to set column types in read_csv.
Matching dtype to actual data prevents parsing errors and improves performance.
Common dtypes include int, float, str, and category.
Avoid setting dtype if data contains invalid values for that type.
Setting dtype helps pandas read data correctly and efficiently.