How to write parquet in pandas

PandasHow-ToBeginner · 3 min read

How to Write Parquet Files Using pandas DataFrame

To write a parquet file in pandas, use the DataFrame.to_parquet() method. This saves your DataFrame in the parquet format, which is efficient for storage and fast to read.

📐

Syntax

The basic syntax to write a parquet file from a pandas DataFrame is:

df.to_parquet(path, engine='auto', compression='snappy')

Here, path is the file name or path where the parquet file will be saved.

engine specifies the parquet library to use (default is 'auto').

compression controls the compression method (default is 'snappy').

python

df.to_parquet(path, engine='auto', compression='snappy')

💻

Example

This example shows how to create a simple DataFrame and save it as a parquet file named example.parquet. Then it reads the file back to verify the content.

python

import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)

# Write DataFrame to parquet file
df.to_parquet('example.parquet')

# Read the parquet file back
df_read = pd.read_parquet('example.parquet')
print(df_read)

Output

Name Age 0 Alice 25 1 Bob 30 2 Charlie 35

⚠️

Common Pitfalls

Common mistakes when writing parquet files in pandas include:

Not having the required parquet engine installed (like pyarrow or fastparquet).
Using unsupported compression types.
Passing an invalid file path or missing file extension.

Always ensure you have installed a parquet engine with pip install pyarrow or pip install fastparquet.

python

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3]})

# Wrong: No parquet engine installed or specified
# df.to_parquet('file.parquet')  # This may raise an error if no engine

# Right: Specify engine and ensure it is installed
# df.to_parquet('file.parquet', engine='pyarrow')

📊

Quick Reference

Parameter	Description	Default
path	File path or object where parquet file is saved	Required
engine	Parquet library to use ('pyarrow', 'fastparquet', or 'auto')	'auto'
compression	Compression method ('snappy', 'gzip', 'brotli', None)	'snappy'
index	Whether to include DataFrame index in file	True

✅

Key Takeaways

Use df.to_parquet() to save pandas DataFrames as parquet files easily.

Install a parquet engine like pyarrow or fastparquet before writing parquet files.

Specify compression to reduce file size; 'snappy' is a good default.

Ensure the file path is valid and has a .parquet extension for clarity.

You can read parquet files back with pd.read_parquet() to verify data.