0
0
R-programmingHow-ToBeginner · 3 min read

How to Train a Model in R: Simple Steps and Example

To train a model in R, use the lm() function for linear regression by specifying a formula and data frame. This fits the model to your data, allowing you to predict or analyze relationships.
📐

Syntax

The basic syntax to train a linear regression model in R is:

  • lm(formula, data): Fits a linear model.
  • formula: Describes the relationship, e.g., y ~ x1 + x2.
  • data: The data frame containing variables.
r
model <- lm(y ~ x1 + x2, data = your_data)
💻

Example

This example shows how to train a linear regression model to predict mpg (miles per gallon) using wt (weight) and hp (horsepower) from the built-in mtcars dataset.

r
model <- lm(mpg ~ wt + hp, data = mtcars)
summary(model)
Output
Call: lm(formula = mpg ~ wt + hp, data = mtcars) Residuals: Min 1Q Median 3Q Max -3.9415 -1.6009 -0.1821 1.0509 5.8543 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.22727 1.59879 23.288 < 2e-16 *** wt -3.87783 0.63273 -6.126 1.12e-06 *** hp -0.03177 0.00903 -3.517 0.00145 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.593 on 29 degrees of freedom Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148 F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
⚠️

Common Pitfalls

Common mistakes when training models in R include:

  • Using variables not in the data frame, causing errors.
  • Forgetting to specify the data argument.
  • Misunderstanding formula syntax, e.g., using + for interaction instead of :.

Always check variable names and data frame before training.

r
## Wrong: variable not in data frame
# model <- lm(mpg ~ weight + hp, data = mtcars)  # 'weight' does not exist

## Right:
model <- lm(mpg ~ wt + hp, data = mtcars)
📊

Quick Reference

FunctionPurposeExample
lm()Train linear regression modellm(y ~ x1 + x2, data = df)
summary()Show model details and statisticssummary(model)
predict()Make predictions from modelpredict(model, newdata)

Key Takeaways

Use the lm() function with a formula and data frame to train linear models in R.
Always ensure variable names in the formula match those in your data frame.
Use summary() to check model fit and coefficients after training.
Common errors come from missing data argument or wrong variable names.
predict() lets you use the trained model to estimate new values.