How to Use dropna in pandas to Remove Missing Data
Use
dropna() in pandas to remove rows or columns with missing values (NaN) from a DataFrame or Series. You can specify whether to drop rows or columns and control how strictly missing data is detected with parameters like axis and how.Syntax
The basic syntax of dropna() is:
df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
Explanation of parameters:
- axis: 0 to drop rows, 1 to drop columns with missing values.
- how: 'any' drops if any NaN present, 'all' drops only if all values are NaN.
- thresh: require a minimum number of non-NaN values to keep the row/column.
- subset: specify columns to check for NaN when dropping rows.
- inplace: if True, modifies the original DataFrame instead of returning a new one.
python
df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
Example
This example shows how to remove rows with any missing values from a DataFrame using dropna(). It also shows how to drop columns with all missing values.
python
import pandas as pd data = {'Name': ['Alice', 'Bob', None, 'David'], 'Age': [25, None, 30, 22], 'City': ['New York', 'Los Angeles', None, None]} df = pd.DataFrame(data) # Drop rows with any NaN values cleaned_rows = df.dropna() # Drop columns where all values are NaN cleaned_cols = df.dropna(axis=1, how='all') print('Original DataFrame:') print(df) print('\nAfter dropping rows with any NaN:') print(cleaned_rows) print('\nAfter dropping columns with all NaN:') print(cleaned_cols)
Output
Original DataFrame:
Name Age City
0 Alice 25.0 New York
1 Bob NaN Los Angeles
2 None 30.0 None
3 David 22.0 None
After dropping rows with any NaN:
Name Age City
0 Alice 25.0 New York
After dropping columns with all NaN:
Name Age City
0 Alice 25.0 New York
1 Bob NaN Los Angeles
2 None 30.0 None
3 David 22.0 None
Common Pitfalls
Common mistakes when using dropna() include:
- Forgetting to set
inplace=Trueif you want to modify the original DataFrame. - Not specifying
axiscorrectly, which can lead to dropping rows instead of columns or vice versa. - Using
dropna()without understandinghowparameter, which can remove more data than intended. - Not using
subsetwhen you want to check NaNs only in specific columns.
Example of a common mistake and fix:
python
import pandas as pd df = pd.DataFrame({'A': [1, None, 3], 'B': [None, None, 6]}) # Wrong: This does not change df because inplace=False by default df.dropna() print('DataFrame after dropna without inplace:') print(df) # Right: Use inplace=True to modify df df.dropna(inplace=True) print('\nDataFrame after dropna with inplace=True:') print(df)
Output
DataFrame after dropna without inplace:
A B
0 1.0 NaN
1 NaN NaN
2 3.0 6.0
DataFrame after dropna with inplace=True:
A B
2 3.0 6.0
Quick Reference
Here is a quick summary of key dropna() options:
| Parameter | Description | Default |
|---|---|---|
| axis | 0 to drop rows, 1 to drop columns | 0 |
| how | 'any' drops if any NaN, 'all' drops if all NaN | 'any' |
| thresh | Minimum non-NaN values to keep row/column | None |
| subset | Columns to check for NaN when dropping rows | None |
| inplace | Modify original DataFrame if True | False |
Key Takeaways
Use dropna() to remove rows or columns with missing values in pandas DataFrames or Series.
Set axis=0 to drop rows and axis=1 to drop columns containing NaNs.
Use how='any' to drop if any NaN exists, or how='all' to drop only if all values are NaN.
Remember to use inplace=True to modify the original DataFrame directly.
Use subset parameter to check for NaNs only in specific columns when dropping rows.