0
0
PandasHow-ToBeginner · 3 min read

How to Write Parquet Files Using pandas DataFrame

To write a parquet file in pandas, use the DataFrame.to_parquet() method. This saves your DataFrame in the parquet format, which is efficient for storage and fast to read.
📐

Syntax

The basic syntax to write a parquet file from a pandas DataFrame is:

  • df.to_parquet(path, engine='auto', compression='snappy')

Here, path is the file name or path where the parquet file will be saved.

engine specifies the parquet library to use (default is 'auto').

compression controls the compression method (default is 'snappy').

python
df.to_parquet(path, engine='auto', compression='snappy')
💻

Example

This example shows how to create a simple DataFrame and save it as a parquet file named example.parquet. Then it reads the file back to verify the content.

python
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)

# Write DataFrame to parquet file
df.to_parquet('example.parquet')

# Read the parquet file back
df_read = pd.read_parquet('example.parquet')
print(df_read)
Output
Name Age 0 Alice 25 1 Bob 30 2 Charlie 35
⚠️

Common Pitfalls

Common mistakes when writing parquet files in pandas include:

  • Not having the required parquet engine installed (like pyarrow or fastparquet).
  • Using unsupported compression types.
  • Passing an invalid file path or missing file extension.

Always ensure you have installed a parquet engine with pip install pyarrow or pip install fastparquet.

python
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3]})

# Wrong: No parquet engine installed or specified
# df.to_parquet('file.parquet')  # This may raise an error if no engine

# Right: Specify engine and ensure it is installed
# df.to_parquet('file.parquet', engine='pyarrow')
📊

Quick Reference

ParameterDescriptionDefault
pathFile path or object where parquet file is savedRequired
engineParquet library to use ('pyarrow', 'fastparquet', or 'auto')'auto'
compressionCompression method ('snappy', 'gzip', 'brotli', None)'snappy'
indexWhether to include DataFrame index in fileTrue

Key Takeaways

Use df.to_parquet() to save pandas DataFrames as parquet files easily.
Install a parquet engine like pyarrow or fastparquet before writing parquet files.
Specify compression to reduce file size; 'snappy' is a good default.
Ensure the file path is valid and has a .parquet extension for clarity.
You can read parquet files back with pd.read_parquet() to verify data.