How to Calculate Correlation in R: Simple Guide
In R, you calculate correlation between two numeric vectors using the
cor() function. This function returns the correlation coefficient, which measures the strength and direction of a linear relationship between variables.Syntax
The basic syntax to calculate correlation in R is:
cor(x, y, method = "pearson")
Where:
xandyare numeric vectors or columns.methodspecifies the correlation type: "pearson" (default), "spearman", or "kendall".
r
cor(x, y, method = "pearson")Example
This example shows how to calculate the Pearson correlation between two numeric vectors a and b.
r
a <- c(1, 2, 3, 4, 5) b <- c(2, 4, 6, 8, 10) correlation <- cor(a, b) print(correlation)
Output
[1] 1
Common Pitfalls
Common mistakes when calculating correlation in R include:
- Passing non-numeric data, which causes errors.
- Ignoring missing values (
NA), which can result inNAoutput. - Using the wrong
methodfor your data type.
To handle missing values, use the use argument like use = "complete.obs" to ignore pairs with missing data.
r
x <- c(1, 2, NA, 4) y <- c(2, 4, 6, NA) # Wrong: returns NA because of missing values cor(x, y) # Right: ignore missing pairs cor(x, y, use = "complete.obs")
Output
[1] NA
[1] 1
Quick Reference
| Argument | Description |
|---|---|
| x, y | Numeric vectors or columns to compare |
| method | "pearson" (default), "spearman", or "kendall" correlation type |
| use | How to handle missing values: "everything", "all.obs", "complete.obs", "pairwise.complete.obs" |
Key Takeaways
Use the cor() function to calculate correlation between numeric vectors in R.
Choose the correlation method based on your data: Pearson for linear, Spearman or Kendall for rank-based.
Handle missing values with the use argument to avoid NA results.
Input data must be numeric; non-numeric inputs cause errors.
cor() returns a value between -1 and 1 indicating strength and direction of correlation.