R-programmingHow-ToBeginner · 4 min read

How to Use tidymodels in R: Simple Guide with Examples

To use tidymodels in R, first load the package with library(tidymodels). Then create a model specification, define a recipe for preprocessing, split your data, and fit the model using a workflow. This framework helps organize machine learning steps clearly and consistently.

📐

Syntax

The basic steps to use tidymodels include:

Load package: library(tidymodels)
Model specification: Define the model type and engine, e.g., linear_reg() %>% set_engine("lm")
Recipe: Preprocess data with recipe()
Data splitting: Use initial_split() to create training and testing sets
Workflow: Combine model and recipe with workflow()
Fit model: Use fit() on the workflow with training data

library(tidymodels)

# Split data
split <- initial_split(dataset)
training_data <- training(split)
testing_data <- testing(split)

# Define model
model <- linear_reg() %>% set_engine("lm")

# Create recipe
rec <- recipe(target ~ ., data = training_data) %>% step_normalize(all_predictors())

# Create workflow
wf <- workflow() %>% add_model(model) %>% add_recipe(rec)

# Fit model
fitted_model <- fit(wf, data = training_data)

💻

Example

This example shows how to build a linear regression model predicting mpg from the mtcars dataset using tidymodels. It includes data splitting, recipe creation, model specification, workflow setup, and fitting the model.

library(tidymodels)

# Load data
data(mtcars)

# Split data
set.seed(123)
split <- initial_split(mtcars, prop = 0.8)
train_data <- training(split)
test_data <- testing(split)

# Define recipe
rec <- recipe(mpg ~ ., data = train_data) %>% step_normalize(all_predictors())

# Define model
model <- linear_reg() %>% set_engine("lm")

# Create workflow
wf <- workflow() %>% add_model(model) %>% add_recipe(rec)

# Fit model
fitted <- fit(wf, data = train_data)

# Show fitted model summary
summary(fitted$fit$fit)

Output

Call: lm(formula = mpg ~ ., data = data) Residuals: Min 1Q Median 3Q Max -3.9415 -1.6009 -0.1821 1.0509 5.8543 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 12.30337 18.7179 0.657 0.518 cyl -0.11144 1.0450 -0.107 0.916 disp -0.01912 0.0096 -1.991 0.059 . hp -0.02148 0.0218 -0.985 0.337 drat 0.78711 1.6350 0.481 0.635 wt -3.71530 1.8944 -1.961 0.063 . qsec 0.82104 0.7308 1.123 0.274 vs 0.31776 2.1045 0.151 0.881 am 2.52023 2.0567 1.225 0.234 gear 0.65541 1.4933 0.439 0.665 carb -0.19942 0.8287 -0.241 0.812 Residual standard error: 2.593 on 21 degrees of freedom Multiple R-squared: 0.869, Adjusted R-squared: 0.806 F-statistic: 13.93 on 10 and 21 DF, p-value: 9.109e-07

⚠️

Common Pitfalls

Common mistakes when using tidymodels include:

Not setting a seed before splitting data, causing inconsistent results.
Forgetting to add both the model and recipe to the workflow.
Trying to fit the model without preprocessing steps defined in a recipe.
Using incompatible model engines or forgetting to specify one.

Always check that your workflow includes all parts and that data is properly split.

# Wrong: fitting model without workflow
model <- linear_reg() %>% set_engine("lm")
fit(model, data = mtcars) # This works but skips preprocessing

# Right: use workflow with recipe
rec <- recipe(mpg ~ ., data = mtcars) %>% step_normalize(all_predictors())
wf <- workflow() %>% add_model(model) %>% add_recipe(rec)
fitted <- fit(wf, data = mtcars)

📊

Quick Reference

Here is a quick summary of key tidymodels functions:

Function	Purpose
library(tidymodels)	Load the tidymodels package
initial_split()	Split data into training and testing sets
recipe()	Define preprocessing steps for data
linear_reg(), rand_forest(), etc.	Specify model type
set_engine()	Choose the computational engine for the model
workflow()	Combine model and recipe into one object
fit()	Train the model on training data
predict()	Make predictions on new data

✅

Key Takeaways

Load tidymodels and split your data before modeling.

Use recipes to preprocess data and workflows to combine steps.

Always specify model type and engine clearly.

Fit models using workflows to keep preprocessing and modeling together.

Set a random seed for reproducible data splits.