0
0
R-programmingHow-ToBeginner · 3 min read

How to Use lm Function in R for Linear Regression

Use the lm() function in R to fit linear regression models by specifying a formula and a data frame. The formula describes the relationship between the dependent and independent variables, like y ~ x, and lm() returns a model object you can analyze.
📐

Syntax

The basic syntax of the lm() function is:

  • formula: A symbolic description of the model, e.g., y ~ x means y depends on x.
  • data: The data frame containing the variables used in the formula.
  • other arguments: Optional settings like subset or weights.
r
lm(formula, data, ...)
💻

Example

This example fits a linear model predicting mpg (miles per gallon) from wt (weight) in the built-in mtcars dataset. It shows how to create the model and view the summary.

r
model <- lm(mpg ~ wt, data = mtcars)
summary(model)
Output
Call: lm(formula = mpg ~ wt, data = mtcars) Residuals: Min 1Q Median 3Q Max -4.5432 -2.3651 -0.1252 1.4103 6.8727 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.2851 1.8776 19.857 < 2e-16 *** wt -5.3445 0.5591 -9.559 1.29e-10 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 3.046 on 30 degrees of freedom Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
⚠️

Common Pitfalls

Common mistakes when using lm() include:

  • Not specifying the data argument, causing R to look for variables in the wrong place.
  • Using incorrect formula syntax, like missing the tilde ~.
  • Trying to predict with non-numeric dependent variables without proper handling.

Always check your formula and data frame carefully.

r
## Wrong: missing data argument
model_wrong <- lm(mpg ~ wt)

## Right: specify data
model_right <- lm(mpg ~ wt, data = mtcars)
📊

Quick Reference

ArgumentDescription
formulaModel formula, e.g., y ~ x1 + x2
dataData frame containing variables
subsetOptional subset of data to use
weightsOptional weights for observations
na.actionHow to handle missing values

Key Takeaways

Use lm() with a formula and data frame to fit linear models in R.
The formula uses ~ to separate dependent and independent variables.
Always specify the data argument to avoid variable not found errors.
Check the model summary to understand coefficients and fit quality.
Common errors include missing data or incorrect formula syntax.