How to Use lm Function in R for Linear Regression
Use the
lm() function in R to fit linear regression models by specifying a formula and a data frame. The formula describes the relationship between the dependent and independent variables, like y ~ x, and lm() returns a model object you can analyze.Syntax
The basic syntax of the lm() function is:
- formula: A symbolic description of the model, e.g.,
y ~ xmeans y depends on x. - data: The data frame containing the variables used in the formula.
- other arguments: Optional settings like subset or weights.
r
lm(formula, data, ...)
Example
This example fits a linear model predicting mpg (miles per gallon) from wt (weight) in the built-in mtcars dataset. It shows how to create the model and view the summary.
r
model <- lm(mpg ~ wt, data = mtcars) summary(model)
Output
Call:
lm(formula = mpg ~ wt, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.5432 -2.3651 -0.1252 1.4103 6.8727
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2851 1.8776 19.857 < 2e-16 ***
wt -5.3445 0.5591 -9.559 1.29e-10 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
Common Pitfalls
Common mistakes when using lm() include:
- Not specifying the
dataargument, causing R to look for variables in the wrong place. - Using incorrect formula syntax, like missing the tilde
~. - Trying to predict with non-numeric dependent variables without proper handling.
Always check your formula and data frame carefully.
r
## Wrong: missing data argument model_wrong <- lm(mpg ~ wt) ## Right: specify data model_right <- lm(mpg ~ wt, data = mtcars)
Quick Reference
| Argument | Description |
|---|---|
| formula | Model formula, e.g., y ~ x1 + x2 |
| data | Data frame containing variables |
| subset | Optional subset of data to use |
| weights | Optional weights for observations |
| na.action | How to handle missing values |
Key Takeaways
Use lm() with a formula and data frame to fit linear models in R.
The formula uses ~ to separate dependent and independent variables.
Always specify the data argument to avoid variable not found errors.
Check the model summary to understand coefficients and fit quality.
Common errors include missing data or incorrect formula syntax.