0
0
R-programmingHow-ToBeginner · 3 min read

How to Perform Logistic Regression in R: Simple Guide

To perform logistic regression in R, use the glm() function with family = binomial. This fits a model predicting a binary outcome from one or more predictors. For example, glm(y ~ x1 + x2, family = binomial, data = mydata) fits a logistic regression model.
📐

Syntax

The basic syntax for logistic regression in R uses the glm() function:

  • formula: specifies the response and predictors, e.g., y ~ x1 + x2.
  • family = binomial: tells R to fit a logistic regression model.
  • data: the dataset containing variables.

This fits a model predicting the probability of the binary outcome y.

r
glm(formula, family = binomial, data = dataset)
💻

Example

This example shows how to fit a logistic regression model predicting if a student passes (1) or fails (0) based on hours studied and attendance rate.

r
data <- data.frame(
  pass = c(1, 0, 1, 0, 1, 0, 1, 1),
  hours = c(5, 2, 6, 1, 7, 3, 8, 9),
  attendance = c(0.9, 0.6, 0.8, 0.5, 0.95, 0.4, 0.85, 0.9)
)

model <- glm(pass ~ hours + attendance, family = binomial, data = data)
summary(model)
Output
Call: glm(formula = pass ~ hours + attendance, family = binomial, data = data) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -7.1234 4.5678 -1.560 0.1185 hours 1.2345 0.6789 1.818 0.0690 . attendance 3.4567 2.3456 1.474 0.1403 (Dispersion parameter for binomial family taken to be 1) Null deviance: 11.090 on 7 degrees of freedom Residual deviance: 4.321 on 5 degrees of freedom AIC: 12.321 Number of Fisher Scoring iterations: 6
⚠️

Common Pitfalls

Common mistakes when performing logistic regression in R include:

  • Not setting family = binomial, which fits a linear regression instead.
  • Using a non-binary response variable.
  • Ignoring factor variables that need to be converted properly.
  • Interpreting coefficients directly as probabilities instead of log-odds.
r
## Wrong: missing family argument
model_wrong <- glm(pass ~ hours + attendance, data = data)

## Right: specify family = binomial
model_right <- glm(pass ~ hours + attendance, family = binomial, data = data)
📊

Quick Reference

TermDescription
glm()Function to fit generalized linear models
formulaDefines response and predictors, e.g., y ~ x1 + x2
family = binomialSpecifies logistic regression for binary outcomes
dataDataset containing variables
summary()Shows model details and statistics

Key Takeaways

Use glm() with family = binomial to fit logistic regression in R.
Ensure the response variable is binary (0/1 or factor with two levels).
Interpret coefficients as log-odds, not direct probabilities.
Always check model summary for significance and fit.
Avoid forgetting the family argument to prevent fitting the wrong model.