0
0
R-programmingHow-ToBeginner · 4 min read

How to Use Random Forest in R: Simple Guide with Example

To use randomForest in R, first install and load the randomForest package, then create a model with randomForest(formula, data). This trains a random forest model for classification or regression based on your data.
📐

Syntax

The basic syntax to create a random forest model in R is:

  • randomForest(formula, data, ntree, mtry)

Where:

  • formula defines the target and predictors (e.g., target ~ . means target predicted by all other variables)
  • data is your dataset
  • ntree is the number of trees to grow (default 500)
  • mtry is the number of variables randomly sampled at each split
r
randomForest(target ~ ., data = your_data, ntree = 500, mtry = NULL)
💻

Example

This example shows how to train a random forest model to classify species in the famous Iris dataset.

r
library(randomForest)

# Load iris dataset
data(iris)

# Train random forest model to predict Species
set.seed(42)  # for reproducibility
model <- randomForest(Species ~ ., data = iris, ntree = 100)

# Print model summary
print(model)

# Predict on training data
predictions <- predict(model, iris)

# Show confusion matrix
table(Predicted = predictions, Actual = iris$Species)
Output
Call: randomForest(formula = Species ~ ., data = iris, ntree = 100) Type of random forest: classification Number of trees: 100 No. of variables tried at each split: 2 OOB estimate of error rate: 4.67% Confusion matrix: Actual Predicted setosa versicolor virginica setosa 50 0 0 versicolor 0 47 3 virginica 0 2 48 Predicted Actual setosa versicolor virginica setosa 50 0 0 versicolor 0 47 3 virginica 0 2 48
⚠️

Common Pitfalls

  • Not setting a seed with set.seed() can cause different results each run, making debugging hard.
  • Using too few trees (ntree) can reduce accuracy.
  • Not checking data types: random forest needs factors for classification targets.
  • Ignoring missing values; random forest in R does not handle NAs automatically.
r
## Wrong: No seed, few trees, missing factor
library(randomForest)

# Convert target to character (wrong for classification)
iris_wrong <- iris
iris_wrong$Species <- as.character(iris_wrong$Species)

# Train with few trees
model_wrong <- randomForest(Species ~ ., data = iris_wrong, ntree = 10)

# Right: Set seed, use factor, more trees
set.seed(123)
iris_right <- iris
iris_right$Species <- as.factor(iris_right$Species)
model_right <- randomForest(Species ~ ., data = iris_right, ntree = 100)

print(model_right)
📊

Quick Reference

Tips for using random forest in R:

  • Use randomForest() from the randomForest package.
  • Set ntree to at least 100 for stable results.
  • Use set.seed() for reproducible models.
  • Ensure classification targets are factors.
  • Use predict() to get predictions from the model.

Key Takeaways

Install and load the randomForest package before use.
Use formula syntax like target ~ predictors to train the model.
Set a seed with set.seed() for reproducible results.
Ensure classification targets are factors, not characters.
Use predict() to apply the model to new data.