0
0
R-programmingHow-ToBeginner ยท 3 min read

How to Use drop_na in tidyr to Remove Missing Values in R

Use drop_na() from the tidyr package in R to remove rows containing missing values (NA) from a data frame. You can apply it to all columns or specify particular columns to check for missing values.
๐Ÿ“

Syntax

The basic syntax of drop_na() is simple. You pass a data frame as the first argument. Optionally, you can specify one or more column names to only remove rows where those columns have missing values.

  • drop_na(data): Removes rows with any NA in any column.
  • drop_na(data, column1, column2): Removes rows with NA only in specified columns.
r
drop_na(data, ...)
๐Ÿ’ป

Example

This example shows how to remove rows with missing values from a data frame using drop_na(). It first removes rows with any NA, then only those with NA in the age column.

r
library(tidyr)
library(dplyr)

# Create example data frame
people <- tibble(
  name = c("Alice", "Bob", "Carol", "David"),
  age = c(25, NA, 30, 22),
  city = c("NY", "LA", NA, "Chicago")
)

# Remove rows with any NA
clean_any <- drop_na(people)

# Remove rows with NA only in 'age'
clean_age <- drop_na(people, age)

clean_any
clean_age
Output
# A tibble: 2 ร— 3 name age city <chr> <dbl> <chr> 1 Alice 25 NY 2 David 22 Chicago # A tibble: 3 ร— 3 name age city <chr> <dbl> <chr> 1 Alice 25 NY 2 Carol 30 NA 3 David 22 Chicago
โš ๏ธ

Common Pitfalls

One common mistake is expecting drop_na() to remove rows with missing values only in some columns without specifying those columns. By default, it removes rows with NA in any column. Another pitfall is forgetting to load the tidyr package before using drop_na().

r
library(tidyr)

# Wrong: expecting to remove NA only in 'age' but not specifying it
people <- tibble(
  name = c("Alice", "Bob", "Carol"),
  age = c(25, NA, 30),
  city = c("NY", "LA", NA)
)

# This removes rows with NA in any column
drop_na(people)

# Correct: specify 'age' to remove rows with NA only in 'age'
drop_na(people, age)
Output
# A tibble: 1 ร— 3 name age city <chr> <dbl> <chr> 1 Alice 25 NY # A tibble: 2 ร— 3 name age city <chr> <dbl> <chr> 1 Alice 25 NY 2 Carol 30 NA
๐Ÿ“Š

Quick Reference

UsageDescription
drop_na(data)Remove rows with any missing values in all columns
drop_na(data, col1, col2)Remove rows with missing values only in specified columns
Requires tidyr packageLoad with library(tidyr) before use
Returns a tibbleOriginal data frame without rows containing NA as specified
โœ…

Key Takeaways

Use drop_na() to remove rows with missing values from data frames in R.
Specify columns in drop_na() to target missing values only in those columns.
Always load the tidyr package with library(tidyr) before using drop_na().
drop_na() returns a new data frame without modifying the original by default.
By default, drop_na() removes rows with NA in any column if no columns are specified.