How to Train a Model in R: Simple Steps and Example
To train a model in R, use the
lm() function for linear regression by specifying a formula and data frame. This fits the model to your data, allowing you to predict or analyze relationships.Syntax
The basic syntax to train a linear regression model in R is:
lm(formula, data): Fits a linear model.formula: Describes the relationship, e.g.,y ~ x1 + x2.data: The data frame containing variables.
r
model <- lm(y ~ x1 + x2, data = your_data)
Example
This example shows how to train a linear regression model to predict mpg (miles per gallon) using wt (weight) and hp (horsepower) from the built-in mtcars dataset.
r
model <- lm(mpg ~ wt + hp, data = mtcars) summary(model)
Output
Call:
lm(formula = mpg ~ wt + hp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.9415 -1.6009 -0.1821 1.0509 5.8543
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.22727 1.59879 23.288 < 2e-16 ***
wt -3.87783 0.63273 -6.126 1.12e-06 ***
hp -0.03177 0.00903 -3.517 0.00145 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.593 on 29 degrees of freedom
Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148
F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
Common Pitfalls
Common mistakes when training models in R include:
- Using variables not in the data frame, causing errors.
- Forgetting to specify the
dataargument. - Misunderstanding formula syntax, e.g., using
+for interaction instead of:.
Always check variable names and data frame before training.
r
## Wrong: variable not in data frame
# model <- lm(mpg ~ weight + hp, data = mtcars) # 'weight' does not exist
## Right:
model <- lm(mpg ~ wt + hp, data = mtcars)Quick Reference
| Function | Purpose | Example |
|---|---|---|
| lm() | Train linear regression model | lm(y ~ x1 + x2, data = df) |
| summary() | Show model details and statistics | summary(model) |
| predict() | Make predictions from model | predict(model, newdata) |
Key Takeaways
Use the lm() function with a formula and data frame to train linear models in R.
Always ensure variable names in the formula match those in your data frame.
Use summary() to check model fit and coefficients after training.
Common errors come from missing data argument or wrong variable names.
predict() lets you use the trained model to estimate new values.