Dropna vs Fillna in pandas: Key Differences and Usage
dropna removes rows or columns containing missing values, while fillna replaces missing values with a specified value or method. Use dropna to discard incomplete data and fillna to keep data by filling gaps.Quick Comparison
This table summarizes the main differences between dropna and fillna in pandas.
| Aspect | dropna | fillna |
|---|---|---|
| Purpose | Removes rows or columns with missing values | Replaces missing values with specified values or methods |
| Effect on Data Size | Usually reduces data size by dropping | Keeps data size same by filling missing spots |
| Parameters | axis, how, thresh, subset | value, method, axis, limit, inplace |
| Use Case | Remove incomplete data | Impute or fill missing data |
| Returns | DataFrame or Series without missing data | DataFrame or Series with missing data replaced |
| Common Methods | how='any' or 'all' to control drop | method='ffill', 'bfill' or constant values |
Key Differences
dropna and fillna serve opposite purposes in handling missing data. dropna removes entire rows or columns that contain NaN values, effectively reducing the dataset size. This is useful when incomplete data cannot be trusted or is not needed.
On the other hand, fillna replaces missing values with a specified constant, or uses methods like forward fill (ffill) or backward fill (bfill) to propagate existing values. This keeps the dataset size unchanged and is helpful when you want to keep all data points but fix gaps.
Both methods accept parameters to control their behavior: dropna lets you specify whether to drop rows or columns and how many missing values trigger a drop, while fillna lets you choose the fill value, method, and limit on how many missing values to fill.
Code Comparison
Here is an example showing how dropna removes rows with missing values from a DataFrame.
import pandas as pd data = {'A': [1, 2, None, 4], 'B': [None, 2, 3, 4], 'C': [1, None, None, 4]} df = pd.DataFrame(data) # Drop rows with any missing values df_dropped = df.dropna() print(df_dropped)
fillna Equivalent
This example shows how fillna replaces missing values with a constant value in the same DataFrame.
import pandas as pd data = {'A': [1, 2, None, 4], 'B': [None, 2, 3, 4], 'C': [1, None, None, 4]} df = pd.DataFrame(data) # Fill missing values with 0 df_filled = df.fillna(0) print(df_filled)
When to Use Which
Choose dropna when you want to remove incomplete data that might skew analysis or when missing values are rare and dropping them won't lose much information. It is best for cleaning datasets before modeling when only complete cases are needed.
Choose fillna when you want to keep all data points and handle missing values by imputing reasonable replacements. This is useful in time series or when missing data is common but should not be discarded.
Key Takeaways
dropna to remove rows or columns with missing data, reducing dataset size.fillna to replace missing values and keep dataset size unchanged.dropna is best when incomplete data is unreliable or minimal.fillna is best when you want to impute missing values and preserve data.