How to Use Random Forest in R: Simple Guide with Example
To use
randomForest in R, first install and load the randomForest package, then create a model with randomForest(formula, data). This trains a random forest model for classification or regression based on your data.Syntax
The basic syntax to create a random forest model in R is:
randomForest(formula, data, ntree, mtry)
Where:
formuladefines the target and predictors (e.g.,target ~ .means target predicted by all other variables)datais your datasetntreeis the number of trees to grow (default 500)mtryis the number of variables randomly sampled at each split
r
randomForest(target ~ ., data = your_data, ntree = 500, mtry = NULL)Example
This example shows how to train a random forest model to classify species in the famous Iris dataset.
r
library(randomForest) # Load iris dataset data(iris) # Train random forest model to predict Species set.seed(42) # for reproducibility model <- randomForest(Species ~ ., data = iris, ntree = 100) # Print model summary print(model) # Predict on training data predictions <- predict(model, iris) # Show confusion matrix table(Predicted = predictions, Actual = iris$Species)
Output
Call:
randomForest(formula = Species ~ ., data = iris, ntree = 100)
Type of random forest: classification
Number of trees: 100
No. of variables tried at each split: 2
OOB estimate of error rate: 4.67%
Confusion matrix:
Actual
Predicted setosa versicolor virginica
setosa 50 0 0
versicolor 0 47 3
virginica 0 2 48
Predicted Actual
setosa versicolor virginica
setosa 50 0 0
versicolor 0 47 3
virginica 0 2 48
Common Pitfalls
- Not setting a seed with
set.seed()can cause different results each run, making debugging hard. - Using too few trees (
ntree) can reduce accuracy. - Not checking data types: random forest needs factors for classification targets.
- Ignoring missing values; random forest in R does not handle NAs automatically.
r
## Wrong: No seed, few trees, missing factor library(randomForest) # Convert target to character (wrong for classification) iris_wrong <- iris iris_wrong$Species <- as.character(iris_wrong$Species) # Train with few trees model_wrong <- randomForest(Species ~ ., data = iris_wrong, ntree = 10) # Right: Set seed, use factor, more trees set.seed(123) iris_right <- iris iris_right$Species <- as.factor(iris_right$Species) model_right <- randomForest(Species ~ ., data = iris_right, ntree = 100) print(model_right)
Quick Reference
Tips for using random forest in R:
- Use
randomForest()from therandomForestpackage. - Set
ntreeto at least 100 for stable results. - Use
set.seed()for reproducible models. - Ensure classification targets are factors.
- Use
predict()to get predictions from the model.
Key Takeaways
Install and load the randomForest package before use.
Use formula syntax like target ~ predictors to train the model.
Set a seed with set.seed() for reproducible results.
Ensure classification targets are factors, not characters.
Use predict() to apply the model to new data.