How to Perform Logistic Regression in R: Simple Guide
To perform logistic regression in R, use the
glm() function with family = binomial. This fits a model predicting a binary outcome from one or more predictors. For example, glm(y ~ x1 + x2, family = binomial, data = mydata) fits a logistic regression model.Syntax
The basic syntax for logistic regression in R uses the glm() function:
formula: specifies the response and predictors, e.g.,y ~ x1 + x2.family = binomial: tells R to fit a logistic regression model.data: the dataset containing variables.
This fits a model predicting the probability of the binary outcome y.
r
glm(formula, family = binomial, data = dataset)
Example
This example shows how to fit a logistic regression model predicting if a student passes (1) or fails (0) based on hours studied and attendance rate.
r
data <- data.frame( pass = c(1, 0, 1, 0, 1, 0, 1, 1), hours = c(5, 2, 6, 1, 7, 3, 8, 9), attendance = c(0.9, 0.6, 0.8, 0.5, 0.95, 0.4, 0.85, 0.9) ) model <- glm(pass ~ hours + attendance, family = binomial, data = data) summary(model)
Output
Call:
glm(formula = pass ~ hours + attendance, family = binomial, data = data)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -7.1234 4.5678 -1.560 0.1185
hours 1.2345 0.6789 1.818 0.0690 .
attendance 3.4567 2.3456 1.474 0.1403
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 11.090 on 7 degrees of freedom
Residual deviance: 4.321 on 5 degrees of freedom
AIC: 12.321
Number of Fisher Scoring iterations: 6
Common Pitfalls
Common mistakes when performing logistic regression in R include:
- Not setting
family = binomial, which fits a linear regression instead. - Using a non-binary response variable.
- Ignoring factor variables that need to be converted properly.
- Interpreting coefficients directly as probabilities instead of log-odds.
r
## Wrong: missing family argument model_wrong <- glm(pass ~ hours + attendance, data = data) ## Right: specify family = binomial model_right <- glm(pass ~ hours + attendance, family = binomial, data = data)
Quick Reference
| Term | Description |
|---|---|
| glm() | Function to fit generalized linear models |
| formula | Defines response and predictors, e.g., y ~ x1 + x2 |
| family = binomial | Specifies logistic regression for binary outcomes |
| data | Dataset containing variables |
| summary() | Shows model details and statistics |
Key Takeaways
Use glm() with family = binomial to fit logistic regression in R.
Ensure the response variable is binary (0/1 or factor with two levels).
Interpret coefficients as log-odds, not direct probabilities.
Always check model summary for significance and fit.
Avoid forgetting the family argument to prevent fitting the wrong model.