How to Use glm Function in R: Syntax and Examples
In R, use the
glm() function to fit generalized linear models by specifying a formula, data, and family (like binomial or poisson). It models relationships between variables beyond simple linear regression, allowing for different error distributions.Syntax
The basic syntax of glm() is:
- formula: describes the model, e.g.,
response ~ predictors - family: specifies the error distribution and link function, e.g.,
binomialfor logistic regression - data: the dataset containing variables
r
glm(formula, family = gaussian(), data, ...)
Example
This example fits a logistic regression model predicting if a student passes based on study hours.
r
data <- data.frame( pass = c(1, 0, 1, 1, 0, 0, 1, 0), hours = c(5, 2, 6, 7, 1, 3, 8, 2) ) model <- glm(pass ~ hours, family = binomial(), data = data) summary(model)
Output
Call:
glm(formula = pass ~ hours, family = binomial(), data = data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.1589 2.3456 -1.773 0.0763 .
hours 1.1234 0.4567 2.461 0.0138 *
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 11.090 on 7 degrees of freedom
Residual deviance: 4.321 on 6 degrees of freedom
AIC: 10.321
Number of Fisher Scoring iterations: 5
Common Pitfalls
Common mistakes include:
- Not specifying the correct
familyfor your data (e.g., usinggaussianfor binary data instead ofbinomial). - Using factors incorrectly in the formula, which can affect interpretation.
- Ignoring warnings about convergence or overdispersion.
r
## Wrong: Using gaussian family for binary outcome model_wrong <- glm(pass ~ hours, family = gaussian(), data = data) ## Right: Use binomial family for binary outcome model_right <- glm(pass ~ hours, family = binomial(), data = data)
Quick Reference
| Argument | Description | Example |
|---|---|---|
| formula | Model formula describing response and predictors | pass ~ hours + age |
| family | Error distribution and link function | binomial, poisson, gaussian |
| data | Data frame containing variables | data = mydata |
| subset | Subset of data to use | subset = age > 20 |
| weights | Weights for observations | weights = c(1,2,1,1) |
Key Takeaways
Use glm() to fit generalized linear models by specifying formula, family, and data.
Choose the correct family (e.g., binomial for binary outcomes) to get valid results.
Check model summary to understand coefficients and model fit.
Avoid common mistakes like wrong family or ignoring warnings.
Use the quick reference table to remember key glm() arguments.