0
0
Data-analysis-pythonHow-ToBeginner ยท 3 min read

How to Drop Missing Values in Python: Simple Guide

To drop missing values in Python, use the dropna() method from the pandas library on a DataFrame or Series. This method removes rows or columns containing NaN values, helping clean your data easily.
๐Ÿ“

Syntax

The dropna() method is used on pandas DataFrames or Series to remove missing values. You can specify whether to drop rows or columns with missing data using the axis parameter. The how parameter controls if rows/columns are dropped when any or all values are missing.

  • df.dropna(axis=0, how='any'): Drops rows with any missing values.
  • df.dropna(axis=1, how='all'): Drops columns where all values are missing.
python
df.dropna(axis=0, how='any')
๐Ÿ’ป

Example

This example shows how to create a DataFrame with missing values and then drop rows that contain any missing values using dropna(). It demonstrates cleaning data by removing incomplete rows.

python
import pandas as pd

# Create a DataFrame with missing values
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, None, 30, 22],
        'City': ['New York', 'Los Angeles', None, 'Chicago']}
df = pd.DataFrame(data)

print("Original DataFrame:")
print(df)

# Drop rows with any missing values
clean_df = df.dropna()

print("\nDataFrame after dropping rows with missing values:")
print(clean_df)
Output
Original DataFrame: Name Age City 0 Alice 25.0 New York 1 Bob NaN Los Angeles 2 Charlie 30.0 NaN 3 David 22.0 Chicago DataFrame after dropping rows with missing values: Name Age City 0 Alice 25.0 New York 3 David 22.0 Chicago
โš ๏ธ

Common Pitfalls

One common mistake is forgetting that dropna() returns a new DataFrame and does not change the original unless you use inplace=True. Another is not specifying the correct axis when you want to drop columns instead of rows. Also, be careful with how parameter: 'any' drops if any value is missing, 'all' drops only if all are missing.

python
import pandas as pd

df = pd.DataFrame({'A': [1, None, 3], 'B': [None, None, 6]})

# Wrong: original df unchanged
wrong = df.dropna()
print("Original DataFrame after dropna without inplace:")
print(df)

# Right: modify original DataFrame

df.dropna(inplace=True)
print("\nDataFrame after dropna with inplace=True:")
print(df)
Output
Original DataFrame after dropna without inplace: A B 0 1.0 NaN 1 NaN NaN 2 3.0 6.0 DataFrame after dropna with inplace=True: A B 2 3.0 6.0
๐Ÿ“Š

Quick Reference

ParameterDescriptionDefault
axis0 to drop rows, 1 to drop columns0
how'any' drops if any missing, 'all' drops if all missing'any'
threshRequire minimum non-NA values to keep row/columnNone
subsetSpecify columns to check for missing valuesNone
inplaceModify original DataFrame if TrueFalse
โœ…

Key Takeaways

Use pandas dropna() to remove missing values from DataFrames or Series.
Remember dropna() returns a new object unless inplace=True is set.
Specify axis=0 to drop rows and axis=1 to drop columns with missing data.
Use how='any' to drop if any missing, or how='all' to drop only if all values are missing.
Check your data after dropping to ensure important information is not lost.