R-programmingHow-ToBeginner · 3 min read

How to Use Shapiro Test for Normality in R

Use the shapiro.test() function in R to check if your data is normally distributed. Pass a numeric vector to shapiro.test(), and it returns a test statistic and p-value to help decide normality.

📐

Syntax

The basic syntax of the Shapiro-Wilk test in R is:

shapiro.test(x): where x is a numeric vector of data values.
The function returns a list with the test statistic W and the p-value.
A small p-value (usually < 0.05) suggests the data is not normally distributed.

shapiro.test(x)

💻

Example

This example shows how to use shapiro.test() on a sample numeric vector to check for normality.

set.seed(123)
data <- rnorm(20, mean = 5, sd = 2)  # generate 20 random normal values
result <- shapiro.test(data)
print(result)

Output

Shapiro-Wilk normality test data: data W = 0.97494, p-value = 0.8479

⚠️

Common Pitfalls

Small sample size: The test is less reliable for very small samples (less than 3) or very large samples (over 5000).
Input type: Passing non-numeric data or data with missing values will cause errors.
Interpretation: A p-value > 0.05 does not prove normality, it only means no strong evidence against it.

## Wrong: passing non-numeric data
# shapiro.test(c("a", "b", "c"))  # Error

## Right: numeric vector without missing values
shapiro.test(c(1.2, 2.3, 3.1, 4.5))

Output

Shapiro-Wilk normality test data: c(1.2, 2.3, 3.1, 4.5) W = 0.9453, p-value = 0.6943

📊

Quick Reference

Parameter	Description
x	Numeric vector of data values to test
W	Shapiro-Wilk test statistic (between 0 and 1)
p-value	Probability value to decide normality (common cutoff 0.05)

✅

Key Takeaways

Use shapiro.test() with a numeric vector to check normality in R.

A p-value below 0.05 suggests data is not normally distributed.

Ensure your data is numeric and has no missing values before testing.

The test is best for sample sizes between 3 and 5000.

Interpret results carefully; a high p-value does not guarantee normality.