0
0
PandasComparisonBeginner · 3 min read

Dropna vs Fillna in pandas: Key Differences and Usage

In pandas, dropna removes rows or columns containing missing values, while fillna replaces missing values with a specified value or method. Use dropna to discard incomplete data and fillna to keep data by filling gaps.
⚖️

Quick Comparison

This table summarizes the main differences between dropna and fillna in pandas.

Aspectdropnafillna
PurposeRemoves rows or columns with missing valuesReplaces missing values with specified values or methods
Effect on Data SizeUsually reduces data size by droppingKeeps data size same by filling missing spots
Parametersaxis, how, thresh, subsetvalue, method, axis, limit, inplace
Use CaseRemove incomplete dataImpute or fill missing data
ReturnsDataFrame or Series without missing dataDataFrame or Series with missing data replaced
Common Methodshow='any' or 'all' to control dropmethod='ffill', 'bfill' or constant values
⚖️

Key Differences

dropna and fillna serve opposite purposes in handling missing data. dropna removes entire rows or columns that contain NaN values, effectively reducing the dataset size. This is useful when incomplete data cannot be trusted or is not needed.

On the other hand, fillna replaces missing values with a specified constant, or uses methods like forward fill (ffill) or backward fill (bfill) to propagate existing values. This keeps the dataset size unchanged and is helpful when you want to keep all data points but fix gaps.

Both methods accept parameters to control their behavior: dropna lets you specify whether to drop rows or columns and how many missing values trigger a drop, while fillna lets you choose the fill value, method, and limit on how many missing values to fill.

⚖️

Code Comparison

Here is an example showing how dropna removes rows with missing values from a DataFrame.

python
import pandas as pd

data = {'A': [1, 2, None, 4], 'B': [None, 2, 3, 4], 'C': [1, None, None, 4]}
df = pd.DataFrame(data)

# Drop rows with any missing values
df_dropped = df.dropna()
print(df_dropped)
Output
A B C 3 4.0 4.0 4.0
↔️

fillna Equivalent

This example shows how fillna replaces missing values with a constant value in the same DataFrame.

python
import pandas as pd

data = {'A': [1, 2, None, 4], 'B': [None, 2, 3, 4], 'C': [1, None, None, 4]}
df = pd.DataFrame(data)

# Fill missing values with 0
df_filled = df.fillna(0)
print(df_filled)
Output
A B C 0 1.0 0.0 1.0 1 2.0 2.0 0.0 2 0.0 3.0 0.0 3 4.0 4.0 4.0
🎯

When to Use Which

Choose dropna when you want to remove incomplete data that might skew analysis or when missing values are rare and dropping them won't lose much information. It is best for cleaning datasets before modeling when only complete cases are needed.

Choose fillna when you want to keep all data points and handle missing values by imputing reasonable replacements. This is useful in time series or when missing data is common but should not be discarded.

Key Takeaways

Use dropna to remove rows or columns with missing data, reducing dataset size.
Use fillna to replace missing values and keep dataset size unchanged.
dropna is best when incomplete data is unreliable or minimal.
fillna is best when you want to impute missing values and preserve data.
Both methods have parameters to customize behavior for your specific needs.