How to Perform Chi-Square Test in R: Syntax and Example
To perform a chi-square test in R, use the
chisq.test() function with a contingency table or vector of observed counts. This function tests if there is a significant association between categorical variables or if observed frequencies differ from expected frequencies.Syntax
The basic syntax of the chi-square test in R is:
chisq.test(x, y = NULL, correct = TRUE)
Where:
xis a contingency table or a vector of observed counts.yis an optional vector of expected counts or a second factor for two-sample tests.correctapplies Yates' continuity correction for 2x2 tables by default.
r
chisq.test(x, y = NULL, correct = TRUE)
Example
This example shows how to perform a chi-square test on a 2x2 contingency table to check if two categorical variables are independent.
r
data <- matrix(c(30, 10, 20, 40), nrow = 2, byrow = TRUE) rownames(data) <- c("Group1", "Group2") colnames(data) <- c("Success", "Failure") result <- chisq.test(data) print(result)
Output
Pearson's Chi-squared test with Yates' continuity correction
X-squared = 8.3333, df = 1, p-value = 0.0039
Common Pitfalls
- Passing raw counts without converting to a table or matrix can cause errors.
- Using
correct = TRUEapplies Yates' correction which may be unnecessary for larger samples. - Expected counts should not be too small; otherwise, the test may be invalid.
- Confusing the order of rows and columns can lead to misinterpretation.
r
## Wrong: passing a vector without table structure chisq.test(c(30, 10, 20, 40)) ## Right: convert to matrix/table first chisq.test(matrix(c(30, 10, 20, 40), nrow=2, byrow=TRUE))
Quick Reference
| Parameter | Description |
|---|---|
| x | Observed counts as table or matrix |
| y | Optional expected counts or second vector |
| correct | Apply Yates' continuity correction (default TRUE) |
| p-value | Probability value to decide significance |
| X-squared | Chi-square test statistic value |
Key Takeaways
Use chisq.test() with a contingency table or matrix of counts to perform the chi-square test in R.
Yates' continuity correction is applied by default for 2x2 tables but can be disabled with correct = FALSE.
Ensure your data is in the correct table or matrix format before running the test.
Check that expected counts are not too small to keep the test valid.
Interpret the p-value to decide if there is a significant association between variables.