How to Use complete.cases in R for Missing Data Handling
In R, use
complete.cases() to find rows in a data frame or vector that have no missing values (NA). It returns a logical vector indicating which rows are complete, allowing you to filter out incomplete data easily.Syntax
The basic syntax of complete.cases() is:
complete.cases(x)
Where x can be a vector, matrix, or data frame. It returns a logical vector of the same length as the number of rows (or elements) in x, with TRUE for rows without any NA values and FALSE otherwise.
r
complete.cases(x)
Example
This example shows how to use complete.cases() to filter out rows with missing values from a data frame.
r
data <- data.frame( name = c("Alice", "Bob", "Carol", "David"), age = c(25, NA, 30, 22), score = c(88, 92, NA, 75) ) # Show original data print(data) # Use complete.cases to find rows without NA complete_rows <- complete.cases(data) print(complete_rows) # Filter data to keep only complete rows clean_data <- data[complete_rows, ] print(clean_data)
Output
name age score
1 Alice 25 88
2 Bob NA 92
3 Carol 30 NA
4 David 22 75
[1] TRUE FALSE FALSE TRUE
name age score
1 Alice 25 88
4 David 22 75
Common Pitfalls
One common mistake is to assume complete.cases() removes rows automatically. It only returns a logical vector; you must subset your data explicitly.
Another pitfall is using it on a vector with missing values expecting a data frame output.
r
data <- data.frame(x = c(1, NA, 3), y = c(NA, 2, 3)) # Wrong: just calling complete.cases does not remove rows complete.cases(data) # Right: subset data using complete.cases clean_data <- data[complete.cases(data), ] print(clean_data)
Output
[1] FALSE FALSE TRUE
x y
3 3 3
Quick Reference
Tips for using complete.cases():
- Use it to identify rows without any
NAvalues. - Subset your data frame with it to remove incomplete rows.
- Works on vectors, matrices, and data frames.
- Returns a logical vector matching the number of rows or elements.
Key Takeaways
Use complete.cases(x) to get a logical vector marking rows without missing values.
Subset your data frame with complete.cases to remove rows containing NA.
complete.cases works on vectors, matrices, and data frames alike.
It does not remove rows by itself; you must subset your data explicitly.
Check the logical output before subsetting to understand which rows are complete.