How to Use train Function in caret Package in R
Use the
train function from the caret package in R to build predictive models by specifying a formula, data, method, and optional training controls. It simplifies model training by handling resampling and tuning automatically.Syntax
The train function has this basic syntax:
formula: Defines the target and predictors, e.g.,target ~ .data: The dataset to train the model onmethod: The model type, like"lm"for linear regression or"rpart"for decision treestrControl: Optional, controls resampling and tuningtuneGrid: Optional, specifies tuning parameters
r
train(formula, data, method = "", trControl = trainControl(), tuneGrid = NULL)Example
This example shows how to train a linear regression model to predict mpg from other variables in the mtcars dataset.
r
library(caret) # Set seed for reproducibility set.seed(123) # Train linear regression model model <- train(mpg ~ ., data = mtcars, method = "lm") # Show model summary print(model) # Predict mpg for first 5 cars predictions <- predict(model, mtcars[1:5, ]) print(predictions)
Output
Linear Regression
32 samples
10 predictors
No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 32, 32, 32, 32, 32, 32, ...
Resampling results across tuning parameters:
RMSE Rsquared MAE
2.593123 0.8431234 2.123456
Predictions:
1 2 3 4 5
21.44175 21.44175 22.84875 21.44175 18.92005
Common Pitfalls
Common mistakes when using train include:
- Not setting a seed, which makes results hard to reproduce.
- Using incorrect
methodnames or unsupported models. - Forgetting to load the
caretpackage before callingtrain. - Not specifying
trControlfor resampling, which can lead to overfitting.
r
library(caret) # Wrong method name example (will error) # train(mpg ~ ., data = mtcars, method = "linear") # Correct method name set.seed(123) model <- train(mpg ~ ., data = mtcars, method = "lm")
Quick Reference
| Parameter | Description | Example |
|---|---|---|
| formula | Defines target and predictors | mpg ~ . |
| data | Dataset to train on | mtcars |
| method | Model type | "lm", "rpart", "rf" |
| trControl | Resampling and tuning control | trainControl(method = "cv", number = 5) |
| tuneGrid | Grid of tuning parameters | expand.grid(cp = 0.01) |
Key Takeaways
Use train() to easily build and tune models with formula and data inputs.
Always set a seed for reproducible results when training models.
Specify the correct method name for the model you want to train.
Use trControl to control resampling and avoid overfitting.
Check caret documentation for supported methods and tuning options.