0
0
R-programmingHow-ToBeginner · 3 min read

How to Calculate Correlation in R: Simple Guide

In R, you calculate correlation between two numeric vectors using the cor() function. This function returns the correlation coefficient, which measures the strength and direction of a linear relationship between variables.
📐

Syntax

The basic syntax to calculate correlation in R is:

  • cor(x, y, method = "pearson")

Where:

  • x and y are numeric vectors or columns.
  • method specifies the correlation type: "pearson" (default), "spearman", or "kendall".
r
cor(x, y, method = "pearson")
💻

Example

This example shows how to calculate the Pearson correlation between two numeric vectors a and b.

r
a <- c(1, 2, 3, 4, 5)
b <- c(2, 4, 6, 8, 10)
correlation <- cor(a, b)
print(correlation)
Output
[1] 1
⚠️

Common Pitfalls

Common mistakes when calculating correlation in R include:

  • Passing non-numeric data, which causes errors.
  • Ignoring missing values (NA), which can result in NA output.
  • Using the wrong method for your data type.

To handle missing values, use the use argument like use = "complete.obs" to ignore pairs with missing data.

r
x <- c(1, 2, NA, 4)
y <- c(2, 4, 6, NA)

# Wrong: returns NA because of missing values
cor(x, y)

# Right: ignore missing pairs
cor(x, y, use = "complete.obs")
Output
[1] NA [1] 1
📊

Quick Reference

ArgumentDescription
x, yNumeric vectors or columns to compare
method"pearson" (default), "spearman", or "kendall" correlation type
useHow to handle missing values: "everything", "all.obs", "complete.obs", "pairwise.complete.obs"

Key Takeaways

Use the cor() function to calculate correlation between numeric vectors in R.
Choose the correlation method based on your data: Pearson for linear, Spearman or Kendall for rank-based.
Handle missing values with the use argument to avoid NA results.
Input data must be numeric; non-numeric inputs cause errors.
cor() returns a value between -1 and 1 indicating strength and direction of correlation.