How to Write Parquet Files Using pandas DataFrame
To write a parquet file in pandas, use the
DataFrame.to_parquet() method. This saves your DataFrame in the parquet format, which is efficient for storage and fast to read.Syntax
The basic syntax to write a parquet file from a pandas DataFrame is:
df.to_parquet(path, engine='auto', compression='snappy')
Here, path is the file name or path where the parquet file will be saved.
engine specifies the parquet library to use (default is 'auto').
compression controls the compression method (default is 'snappy').
python
df.to_parquet(path, engine='auto', compression='snappy')
Example
This example shows how to create a simple DataFrame and save it as a parquet file named example.parquet. Then it reads the file back to verify the content.
python
import pandas as pd # Create a sample DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]} df = pd.DataFrame(data) # Write DataFrame to parquet file df.to_parquet('example.parquet') # Read the parquet file back df_read = pd.read_parquet('example.parquet') print(df_read)
Output
Name Age
0 Alice 25
1 Bob 30
2 Charlie 35
Common Pitfalls
Common mistakes when writing parquet files in pandas include:
- Not having the required parquet engine installed (like
pyarroworfastparquet). - Using unsupported compression types.
- Passing an invalid file path or missing file extension.
Always ensure you have installed a parquet engine with pip install pyarrow or pip install fastparquet.
python
import pandas as pd df = pd.DataFrame({'A': [1, 2, 3]}) # Wrong: No parquet engine installed or specified # df.to_parquet('file.parquet') # This may raise an error if no engine # Right: Specify engine and ensure it is installed # df.to_parquet('file.parquet', engine='pyarrow')
Quick Reference
| Parameter | Description | Default |
|---|---|---|
| path | File path or object where parquet file is saved | Required |
| engine | Parquet library to use ('pyarrow', 'fastparquet', or 'auto') | 'auto' |
| compression | Compression method ('snappy', 'gzip', 'brotli', None) | 'snappy' |
| index | Whether to include DataFrame index in file | True |
Key Takeaways
Use df.to_parquet() to save pandas DataFrames as parquet files easily.
Install a parquet engine like pyarrow or fastparquet before writing parquet files.
Specify compression to reduce file size; 'snappy' is a good default.
Ensure the file path is valid and has a .parquet extension for clarity.
You can read parquet files back with pd.read_parquet() to verify data.