0
0
R-programmingHow-ToBeginner · 4 min read

How to Use glm Function in R: Syntax and Examples

In R, use the glm() function to fit generalized linear models by specifying a formula, data, and family (like binomial or poisson). It models relationships between variables beyond simple linear regression, allowing for different error distributions.
📐

Syntax

The basic syntax of glm() is:

  • formula: describes the model, e.g., response ~ predictors
  • family: specifies the error distribution and link function, e.g., binomial for logistic regression
  • data: the dataset containing variables
r
glm(formula, family = gaussian(), data, ...)
💻

Example

This example fits a logistic regression model predicting if a student passes based on study hours.

r
data <- data.frame(
  pass = c(1, 0, 1, 1, 0, 0, 1, 0),
  hours = c(5, 2, 6, 7, 1, 3, 8, 2)
)
model <- glm(pass ~ hours, family = binomial(), data = data)
summary(model)
Output
Call: glm(formula = pass ~ hours, family = binomial(), data = data) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -4.1589 2.3456 -1.773 0.0763 . hours 1.1234 0.4567 2.461 0.0138 * (Dispersion parameter for binomial family taken to be 1) Null deviance: 11.090 on 7 degrees of freedom Residual deviance: 4.321 on 6 degrees of freedom AIC: 10.321 Number of Fisher Scoring iterations: 5
⚠️

Common Pitfalls

Common mistakes include:

  • Not specifying the correct family for your data (e.g., using gaussian for binary data instead of binomial).
  • Using factors incorrectly in the formula, which can affect interpretation.
  • Ignoring warnings about convergence or overdispersion.
r
## Wrong: Using gaussian family for binary outcome
model_wrong <- glm(pass ~ hours, family = gaussian(), data = data)

## Right: Use binomial family for binary outcome
model_right <- glm(pass ~ hours, family = binomial(), data = data)
📊

Quick Reference

ArgumentDescriptionExample
formulaModel formula describing response and predictorspass ~ hours + age
familyError distribution and link functionbinomial, poisson, gaussian
dataData frame containing variablesdata = mydata
subsetSubset of data to usesubset = age > 20
weightsWeights for observationsweights = c(1,2,1,1)

Key Takeaways

Use glm() to fit generalized linear models by specifying formula, family, and data.
Choose the correct family (e.g., binomial for binary outcomes) to get valid results.
Check model summary to understand coefficients and model fit.
Avoid common mistakes like wrong family or ignoring warnings.
Use the quick reference table to remember key glm() arguments.