0
0
Data-analysis-pythonHow-ToBeginner ยท 3 min read

How to Use ydata-profiling in Python for Data Analysis

To use ydata-profiling in Python, first install it with pip install ydata-profiling. Then import ProfileReport from ydata_profiling, create a profile from a pandas DataFrame, and generate an HTML report with profile.to_file().
๐Ÿ“

Syntax

The basic syntax to create a profile report using ydata-profiling involves importing ProfileReport, passing a pandas DataFrame to it, and then saving or displaying the report.

  • ProfileReport(df): Creates a profile report object from the DataFrame df.
  • profile.to_file(output_file): Saves the report as an HTML file.
  • profile.to_notebook_iframe(): Displays the report inside a Jupyter notebook.
python
from ydata_profiling import ProfileReport
import pandas as pd

df = pd.DataFrame()  # Your data here
profile = ProfileReport(df)
profile.to_file("output.html")
๐Ÿ’ป

Example

This example shows how to load a sample dataset, create a profile report, and save it as an HTML file named report.html. You can open this file in a browser to explore your data.

python
from ydata_profiling import ProfileReport
import pandas as pd

# Load sample data
url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv"
df = pd.read_csv(url)

# Create profile report
profile = ProfileReport(df, title="Titanic Dataset Profile", explorative=True)

# Save report to HTML file
profile.to_file("report.html")
Output
Wrote report to report.html
โš ๏ธ

Common Pitfalls

Common mistakes when using ydata-profiling include:

  • Not installing the package before importing it.
  • Passing non-pandas DataFrame objects, which causes errors.
  • Trying to generate reports on very large datasets without sampling, which can be slow or crash.
  • Forgetting to save the report or display it in notebooks.

Always ensure your data is a pandas DataFrame and consider using sampling for big data.

python
from ydata_profiling import ProfileReport
import pandas as pd

# Wrong: passing a list instead of DataFrame
# profile = ProfileReport([1, 2, 3])  # This will raise an error

# Right:
df = pd.DataFrame([1, 2, 3], columns=["numbers"])
profile = ProfileReport(df)
profile.to_file("correct_report.html")
Output
Wrote report to correct_report.html
๐Ÿ“Š

Quick Reference

Here is a quick summary of key ydata-profiling commands:

CommandDescription
ProfileReport(df)Create a profile report from DataFrame df
profile.to_file("file.html")Save the report as an HTML file
profile.to_notebook_iframe()Display the report inside a Jupyter notebook
ProfileReport(df, explorative=True)Enable advanced analysis features
profile.to_widgets()Show interactive widgets in Jupyter
โœ…

Key Takeaways

Install ydata-profiling with pip before using it.
Always pass a pandas DataFrame to ProfileReport.
Use profile.to_file() to save the report as HTML.
For large datasets, consider sampling to avoid slow reports.
You can display reports directly in Jupyter notebooks with to_notebook_iframe().