0
0
PandasHow-ToBeginner · 3 min read

How to Use dropna in pandas to Remove Missing Data

Use dropna() in pandas to remove rows or columns with missing values (NaN) from a DataFrame or Series. You can specify whether to drop rows or columns and control how strictly missing data is detected with parameters like axis and how.
📐

Syntax

The basic syntax of dropna() is:

  • df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)

Explanation of parameters:

  • axis: 0 to drop rows, 1 to drop columns with missing values.
  • how: 'any' drops if any NaN present, 'all' drops only if all values are NaN.
  • thresh: require a minimum number of non-NaN values to keep the row/column.
  • subset: specify columns to check for NaN when dropping rows.
  • inplace: if True, modifies the original DataFrame instead of returning a new one.
python
df.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
💻

Example

This example shows how to remove rows with any missing values from a DataFrame using dropna(). It also shows how to drop columns with all missing values.

python
import pandas as pd

data = {'Name': ['Alice', 'Bob', None, 'David'],
        'Age': [25, None, 30, 22],
        'City': ['New York', 'Los Angeles', None, None]}

df = pd.DataFrame(data)

# Drop rows with any NaN values
cleaned_rows = df.dropna()

# Drop columns where all values are NaN
cleaned_cols = df.dropna(axis=1, how='all')

print('Original DataFrame:')
print(df)
print('\nAfter dropping rows with any NaN:')
print(cleaned_rows)
print('\nAfter dropping columns with all NaN:')
print(cleaned_cols)
Output
Original DataFrame: Name Age City 0 Alice 25.0 New York 1 Bob NaN Los Angeles 2 None 30.0 None 3 David 22.0 None After dropping rows with any NaN: Name Age City 0 Alice 25.0 New York After dropping columns with all NaN: Name Age City 0 Alice 25.0 New York 1 Bob NaN Los Angeles 2 None 30.0 None 3 David 22.0 None
⚠️

Common Pitfalls

Common mistakes when using dropna() include:

  • Forgetting to set inplace=True if you want to modify the original DataFrame.
  • Not specifying axis correctly, which can lead to dropping rows instead of columns or vice versa.
  • Using dropna() without understanding how parameter, which can remove more data than intended.
  • Not using subset when you want to check NaNs only in specific columns.

Example of a common mistake and fix:

python
import pandas as pd

df = pd.DataFrame({'A': [1, None, 3], 'B': [None, None, 6]})

# Wrong: This does not change df because inplace=False by default
df.dropna()
print('DataFrame after dropna without inplace:')
print(df)

# Right: Use inplace=True to modify df
df.dropna(inplace=True)
print('\nDataFrame after dropna with inplace=True:')
print(df)
Output
DataFrame after dropna without inplace: A B 0 1.0 NaN 1 NaN NaN 2 3.0 6.0 DataFrame after dropna with inplace=True: A B 2 3.0 6.0
📊

Quick Reference

Here is a quick summary of key dropna() options:

ParameterDescriptionDefault
axis0 to drop rows, 1 to drop columns0
how'any' drops if any NaN, 'all' drops if all NaN'any'
threshMinimum non-NaN values to keep row/columnNone
subsetColumns to check for NaN when dropping rowsNone
inplaceModify original DataFrame if TrueFalse

Key Takeaways

Use dropna() to remove rows or columns with missing values in pandas DataFrames or Series.
Set axis=0 to drop rows and axis=1 to drop columns containing NaNs.
Use how='any' to drop if any NaN exists, or how='all' to drop only if all values are NaN.
Remember to use inplace=True to modify the original DataFrame directly.
Use subset parameter to check for NaNs only in specific columns when dropping rows.