0
0
Data-analysis-pythonHow-ToBeginner ยท 3 min read

How to Use pandas profiling in Python for Data Analysis

To use pandas_profiling in Python, first install it via pip install pandas-profiling. Then import it and call ProfileReport on a pandas DataFrame to generate an interactive HTML report summarizing your data.
๐Ÿ“

Syntax

The basic syntax to create a pandas profiling report is:

  • ProfileReport(df): Creates a report object from a pandas DataFrame df.
  • .to_file(output_file): Saves the report as an HTML file.
  • .to_notebook_iframe(): Displays the report inside a Jupyter notebook.
python
from pandas_profiling import ProfileReport
import pandas as pd

# Create a profile report from a DataFrame
profile = ProfileReport(df, title="Pandas Profiling Report")

# Save report to an HTML file
profile.to_file("report.html")
๐Ÿ’ป

Example

This example shows how to generate a profiling report for a simple dataset and save it as an HTML file.

python
from pandas_profiling import ProfileReport
import pandas as pd

# Sample data
data = {
    "name": ["Alice", "Bob", "Charlie", "David"],
    "age": [25, 30, 35, 40],
    "city": ["New York", "Los Angeles", "Chicago", "Houston"]
}
df = pd.DataFrame(data)

# Generate the profile report
profile = ProfileReport(df, title="Sample Data Report", explorative=True)

# Save the report to an HTML file
profile.to_file("sample_report.html")

print("Report generated and saved as sample_report.html")
Output
Report generated and saved as sample_report.html
โš ๏ธ

Common Pitfalls

Common mistakes when using pandas profiling include:

  • Not installing the package before importing it (pip install pandas-profiling is required).
  • Passing a non-pandas DataFrame object to ProfileReport.
  • Trying to generate reports on very large datasets without sampling, which can cause performance issues.
  • Forgetting to save or display the report, so no output is seen.

Always check your DataFrame is valid and consider using the minimal=True option for large data.

python
import pandas as pd

# Wrong: passing a list instead of DataFrame
# profile = ProfileReport([1, 2, 3])  # This will cause an error

# Right: convert list to DataFrame first
from pandas_profiling import ProfileReport

df = pd.DataFrame([1, 2, 3], columns=["numbers"])
profile = ProfileReport(df)
profile.to_file("correct_report.html")
๐Ÿ“Š

Quick Reference

Summary tips for using pandas profiling:

  • Install with pip install pandas-profiling.
  • Import with from pandas_profiling import ProfileReport.
  • Create report: ProfileReport(df).
  • Save report: .to_file('report.html').
  • Display in Jupyter: .to_notebook_iframe().
  • Use explorative=True for more detailed analysis.
  • Use minimal=True for large datasets to improve speed.
โœ…

Key Takeaways

Install pandas-profiling with pip before using it.
Use ProfileReport on a pandas DataFrame to generate a detailed data report.
Save the report as an HTML file or display it in a Jupyter notebook.
For large datasets, use minimal mode to avoid performance issues.
Always ensure your input is a valid pandas DataFrame.