PandasHow-ToBeginner · 3 min read

How to Use dropna in pandas to Remove Missing Data

Use dropna() in pandas to remove rows or columns with missing values (NaN) from a DataFrame or Series. You can specify whether to drop rows or columns and control how strictly missing data is detected with parameters like axis and how.

📐

Syntax

The basic syntax of dropna() is:

df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Explanation of parameters:

axis: 0 to drop rows, 1 to drop columns with missing values.
how: 'any' drops if any NaN present, 'all' drops only if all values are NaN.
thresh: require a minimum number of non-NaN values to keep the row/column.
subset: specify columns to check for NaN when dropping rows.
inplace: if True, modifies the original DataFrame instead of returning a new one.

python

df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

💻

Example

This example shows how to remove rows with any missing values from a DataFrame using dropna(). It also shows how to drop columns with all missing values.

python

import pandas as pd

data = {'Name': ['Alice', 'Bob', None, 'David'],
        'Age': [25, None, 30, 22],
        'City': ['New York', 'Los Angeles', None, None]}

df = pd.DataFrame(data)

# Drop rows with any NaN values
cleaned_rows = df.dropna()

# Drop columns where all values are NaN
cleaned_cols = df.dropna(axis=1, how='all')

print('Original DataFrame:')
print(df)
print('\nAfter dropping rows with any NaN:')
print(cleaned_rows)
print('\nAfter dropping columns with all NaN:')
print(cleaned_cols)

Output

Original DataFrame: Name Age City 0 Alice 25.0 New York 1 Bob NaN Los Angeles 2 None 30.0 None 3 David 22.0 None After dropping rows with any NaN: Name Age City 0 Alice 25.0 New York After dropping columns with all NaN: Name Age City 0 Alice 25.0 New York 1 Bob NaN Los Angeles 2 None 30.0 None 3 David 22.0 None

⚠️

Common Pitfalls

Common mistakes when using dropna() include:

Forgetting to set inplace=True if you want to modify the original DataFrame.
Not specifying axis correctly, which can lead to dropping rows instead of columns or vice versa.
Using dropna() without understanding how parameter, which can remove more data than intended.
Not using subset when you want to check NaNs only in specific columns.

Example of a common mistake and fix:

python

import pandas as pd

df = pd.DataFrame({'A': [1, None, 3], 'B': [None, None, 6]})

# Wrong: This does not change df because inplace=False by default
df.dropna()
print('DataFrame after dropna without inplace:')
print(df)

# Right: Use inplace=True to modify df
df.dropna(inplace=True)
print('\nDataFrame after dropna with inplace=True:')
print(df)

Output

DataFrame after dropna without inplace: A B 0 1.0 NaN 1 NaN NaN 2 3.0 6.0 DataFrame after dropna with inplace=True: A B 2 3.0 6.0

📊

Quick Reference

Here is a quick summary of key dropna() options:

Parameter	Description	Default
axis	0 to drop rows, 1 to drop columns	0
how	'any' drops if any NaN, 'all' drops if all NaN	'any'
thresh	Minimum non-NaN values to keep row/column	None
subset	Columns to check for NaN when dropping rows	None
inplace	Modify original DataFrame if True	False

✅

Key Takeaways

Use dropna() to remove rows or columns with missing values in pandas DataFrames or Series.

Set axis=0 to drop rows and axis=1 to drop columns containing NaNs.

Use how='any' to drop if any NaN exists, or how='all' to drop only if all values are NaN.

Remember to use inplace=True to modify the original DataFrame directly.

Use subset parameter to check for NaNs only in specific columns when dropping rows.