How to Use drop_na in tidyr to Remove Missing Values in R
Use
drop_na() from the tidyr package in R to remove rows containing missing values (NA) from a data frame. You can apply it to all columns or specify particular columns to check for missing values.Syntax
The basic syntax of drop_na() is simple. You pass a data frame as the first argument. Optionally, you can specify one or more column names to only remove rows where those columns have missing values.
drop_na(data): Removes rows with anyNAin any column.drop_na(data, column1, column2): Removes rows withNAonly in specified columns.
r
drop_na(data, ...)
Example
This example shows how to remove rows with missing values from a data frame using drop_na(). It first removes rows with any NA, then only those with NA in the age column.
r
library(tidyr) library(dplyr) # Create example data frame people <- tibble( name = c("Alice", "Bob", "Carol", "David"), age = c(25, NA, 30, 22), city = c("NY", "LA", NA, "Chicago") ) # Remove rows with any NA clean_any <- drop_na(people) # Remove rows with NA only in 'age' clean_age <- drop_na(people, age) clean_any clean_age
Output
# A tibble: 2 ร 3
name age city
<chr> <dbl> <chr>
1 Alice 25 NY
2 David 22 Chicago
# A tibble: 3 ร 3
name age city
<chr> <dbl> <chr>
1 Alice 25 NY
2 Carol 30 NA
3 David 22 Chicago
Common Pitfalls
One common mistake is expecting drop_na() to remove rows with missing values only in some columns without specifying those columns. By default, it removes rows with NA in any column. Another pitfall is forgetting to load the tidyr package before using drop_na().
r
library(tidyr) # Wrong: expecting to remove NA only in 'age' but not specifying it people <- tibble( name = c("Alice", "Bob", "Carol"), age = c(25, NA, 30), city = c("NY", "LA", NA) ) # This removes rows with NA in any column drop_na(people) # Correct: specify 'age' to remove rows with NA only in 'age' drop_na(people, age)
Output
# A tibble: 1 ร 3
name age city
<chr> <dbl> <chr>
1 Alice 25 NY
# A tibble: 2 ร 3
name age city
<chr> <dbl> <chr>
1 Alice 25 NY
2 Carol 30 NA
Quick Reference
| Usage | Description |
|---|---|
| drop_na(data) | Remove rows with any missing values in all columns |
| drop_na(data, col1, col2) | Remove rows with missing values only in specified columns |
| Requires tidyr package | Load with library(tidyr) before use |
| Returns a tibble | Original data frame without rows containing NA as specified |
Key Takeaways
Use drop_na() to remove rows with missing values from data frames in R.
Specify columns in drop_na() to target missing values only in those columns.
Always load the tidyr package with library(tidyr) before using drop_na().
drop_na() returns a new data frame without modifying the original by default.
By default, drop_na() removes rows with NA in any column if no columns are specified.