How to Use Shapiro Test for Normality in R
Use the
shapiro.test() function in R to check if your data is normally distributed. Pass a numeric vector to shapiro.test(), and it returns a test statistic and p-value to help decide normality.Syntax
The basic syntax of the Shapiro-Wilk test in R is:
shapiro.test(x): wherexis a numeric vector of data values.- The function returns a list with the test statistic
Wand thep-value. - A small p-value (usually < 0.05) suggests the data is not normally distributed.
r
shapiro.test(x)
Example
This example shows how to use shapiro.test() on a sample numeric vector to check for normality.
r
set.seed(123) data <- rnorm(20, mean = 5, sd = 2) # generate 20 random normal values result <- shapiro.test(data) print(result)
Output
Shapiro-Wilk normality test
data: data
W = 0.97494, p-value = 0.8479
Common Pitfalls
- Small sample size: The test is less reliable for very small samples (less than 3) or very large samples (over 5000).
- Input type: Passing non-numeric data or data with missing values will cause errors.
- Interpretation: A p-value > 0.05 does not prove normality, it only means no strong evidence against it.
r
## Wrong: passing non-numeric data # shapiro.test(c("a", "b", "c")) # Error ## Right: numeric vector without missing values shapiro.test(c(1.2, 2.3, 3.1, 4.5))
Output
Shapiro-Wilk normality test
data: c(1.2, 2.3, 3.1, 4.5)
W = 0.9453, p-value = 0.6943
Quick Reference
| Parameter | Description |
|---|---|
| x | Numeric vector of data values to test |
| W | Shapiro-Wilk test statistic (between 0 and 1) |
| p-value | Probability value to decide normality (common cutoff 0.05) |
Key Takeaways
Use shapiro.test() with a numeric vector to check normality in R.
A p-value below 0.05 suggests data is not normally distributed.
Ensure your data is numeric and has no missing values before testing.
The test is best for sample sizes between 3 and 5000.
Interpret results carefully; a high p-value does not guarantee normality.