0
0
R Programmingprogramming~10 mins

Handling missing values (drop_na, fill) in R Programming - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Handling missing values (drop_na, fill)
Start with data frame
Check for missing values
drop_na()
Remove rows
Cleaned data frame
Start with a data frame, check for missing values, then either remove rows with missing data using drop_na() or fill missing values using fill(), resulting in a cleaned data frame.
Execution Sample
R Programming
library(tidyr)
data <- data.frame(x = c(1, NA, 3), y = c(NA, 2, 3))
data_clean <- drop_na(data)
data_filled <- fill(data, x, .direction = "down")
This code creates a data frame with missing values, then removes rows with any NA using drop_na(), and fills missing x values downward using fill().
Execution Table
StepData Frame StateActionResulting Data Frame
1x: 1, NA, 3; y: NA, 2, 3Initial data frame with missing valuesx: 1, NA, 3; y: NA, 2, 3
2x: 1, NA, 3; y: NA, 2, 3Apply drop_na() to remove rows with any NAx: 3; y: 3 (only row 3 remains)
3x: 1, NA, 3; y: NA, 2, 3Apply fill(x, .direction = "down") to fill NA in xx: 1, 1, 3; y: NA, 2, 3
4x: 1, 1, 3; y: NA, 2, 3No further actionFinal cleaned data frames ready
💡 All missing values handled either by removal or filling; process complete.
Variable Tracker
VariableStartAfter drop_naAfter fillFinal
datax: 1, NA, 3; y: NA, 2, 3UnchangedUnchangedUnchanged
data_cleanNAx: 3; y: 3NAx: 3; y: 3
data_filledNANAx: 1, 1, 3; y: NA, 2, 3x: 1, 1, 3; y: NA, 2, 3
Key Moments - 2 Insights
Why does drop_na() remove the second row but keep the third row even though the third row has no missing x value?
drop_na() removes rows with any NA in any column. The second row has NA in x, so it is removed. The third row has no NA in any column, so it stays (see execution_table step 2).
How does fill() decide what value to use to replace NA in the x column?
fill() replaces NA with the last non-NA value above it in the column (downward direction). So the NA in row 2 is replaced by 1 from row 1 (see execution_table step 3).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 2. What rows remain after drop_na() is applied?
AFirst and third rows
BOnly the third row
COnly the first row
DAll rows
💡 Hint
Check the 'Resulting Data Frame' column at step 2 in execution_table.
According to variable_tracker, what is the value of x in the second row after fill() is applied?
ANA
B3
C1
D2
💡 Hint
Look at 'data_filled' column in variable_tracker for the x variable.
If we changed fill() direction to 'up', what would happen to the NA in x at row 2?
AIt would be replaced by 3 from row 3
BIt would remain NA
CIt would be replaced by 1 from row 1
DAll rows would be removed
💡 Hint
fill() with direction 'up' fills NA with next non-NA value below (see fill() behavior).
Concept Snapshot
Handling missing values in R with tidyr:
- drop_na(data): removes rows with any NA
- fill(data, column, .direction): fills NA with nearby values
- drop_na removes rows; fill replaces NA
- Use fill direction 'down' or 'up' to control filling
- Clean data frames ready for analysis
Full Transcript
We start with a data frame containing missing values (NA). We use drop_na() to remove any rows that have missing values in any column, leaving only complete rows. Alternatively, we use fill() to replace missing values in a specific column by carrying forward or backward the last known value. In the example, drop_na() removes the second row because it has NA, and fill() replaces the NA in the second row's x column with the value from the first row. This way, we clean the data by either removing or filling missing values, preparing it for further use.