How to Check Percentage of Missing Values in pandas DataFrame
To check the percentage of missing values in a pandas DataFrame, use
df.isnull().mean() * 100. This calculates the fraction of missing values per column and converts it to a percentage.Syntax
The main syntax to find the percentage of missing values in pandas is:
df.isnull(): Returns a DataFrame of the same shape withTruewhere values are missing..mean(): Calculates the mean ofTruevalues per column, treatingTrueas 1 andFalseas 0.- Multiplying by
100converts the fraction to a percentage.
python
df.isnull().mean() * 100Example
This example shows how to create a DataFrame with missing values and calculate the percentage of missing data per column.
python
import pandas as pd data = {'Name': ['Alice', 'Bob', None, 'David'], 'Age': [25, None, 30, 22], 'City': ['New York', 'Los Angeles', 'Chicago', None]} df = pd.DataFrame(data) missing_percentage = df.isnull().mean() * 100 print(missing_percentage)
Output
Name 25.0
Age 25.0
City 25.0
dtype: float64
Common Pitfalls
Some common mistakes when checking missing values percentage include:
- Using
df.isnull().sum()alone, which gives counts, not percentages. - Forgetting to multiply by 100 to get percentages.
- Not considering missing values in rows if you want overall dataset percentage.
Always use mean() to get the fraction and multiply by 100 for percentage.
python
import pandas as pd data = {'A': [1, None, 3], 'B': [None, None, 6]} df = pd.DataFrame(data) # Wrong: gives counts, not percentage print(df.isnull().sum()) # Right: gives percentage print(df.isnull().mean() * 100)
Output
A 1
B 2
dtype: int64
A 33.333333
B 66.666667
dtype: float64
Quick Reference
| Method | Description | Output Type |
|---|---|---|
| df.isnull() | Detects missing values, returns boolean DataFrame | DataFrame of bool |
| df.isnull().sum() | Counts missing values per column | Series of int |
| df.isnull().mean() | Fraction of missing values per column | Series of float |
| df.isnull().mean() * 100 | Percentage of missing values per column | Series of float |
Key Takeaways
Use df.isnull().mean() * 100 to get the percentage of missing values per column.
Multiplying by 100 converts the fraction to a readable percentage.
df.isnull().sum() only gives counts, not percentages.
Check missing values per column to understand data quality.
Always verify your DataFrame before analysis to handle missing data properly.