How to Replace NaN with Mean in pandas DataFrame
Use the
fillna() method with the mean value of the column to replace NaN values in a pandas DataFrame. Calculate the mean using df['column'].mean() and then apply df['column'].fillna(mean_value, inplace=True) to update the data.Syntax
The basic syntax to replace NaN values with the mean in a pandas DataFrame column is:
df['column'].mean(): Calculates the mean of the specified column, ignoringNaNvalues.df['column'].fillna(value, inplace=True): ReplacesNaNvalues with the givenvalue. Settinginplace=Trueupdates the DataFrame directly.
python
mean_value = df['column'].mean() df['column'].fillna(mean_value, inplace=True)
Example
This example shows how to replace NaN values in the 'Age' column with the mean age in a pandas DataFrame.
python
import pandas as pd import numpy as np data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, np.nan, 30, np.nan]} df = pd.DataFrame(data) mean_age = df['Age'].mean() df['Age'].fillna(mean_age, inplace=True) print(df)
Output
Name Age
0 Alice 25.0
1 Bob 27.5
2 Charlie 30.0
3 David 27.5
Common Pitfalls
Common mistakes when replacing NaN with mean include:
- Not calculating the mean before filling, which causes errors.
- Forgetting to set
inplace=True, so the DataFrame is not updated. - Applying mean replacement on non-numeric columns, which will fail.
Always ensure the column is numeric and the mean is calculated first.
python
import pandas as pd import numpy as np data = {'Score': [10, np.nan, 20]} df = pd.DataFrame(data) # Wrong: fillna with mean without calculating mean # df['Score'].fillna(df['Score'].fillna(), inplace=True) # This does nothing # Correct way: mean_score = df['Score'].mean() df['Score'].fillna(mean_score, inplace=True) print(df)
Output
Score
0 10.0
1 15.0
2 20.0
Quick Reference
Summary tips for replacing NaN with mean in pandas:
- Use
df['col'].mean()to get the mean. - Use
fillna(mean_value, inplace=True)to replaceNaN. - Check column data type is numeric before applying.
- Use
inplace=Trueto modify the DataFrame directly.
Key Takeaways
Calculate the mean of the column before replacing NaN values.
Use fillna() with inplace=True to update the DataFrame directly.
Only apply mean replacement on numeric columns to avoid errors.
Always verify the DataFrame after replacement to confirm changes.