How to Evaluate Model in R: Syntax, Example, and Tips
To evaluate a model in R, use functions like
summary() for model details, predict() to get predictions, and metrics such as confusionMatrix() from the caret package or RMSE() from the caret package for regression. These tools help measure how well your model performs on test data.Syntax
Here are common functions used to evaluate models in R:
summary(model): Shows model details and statistics.predict(model, newdata): Generates predictions on new data.confusionMatrix(predictions, actuals): Calculates classification accuracy and other metrics (fromcaretpackage).RMSE(predictions, actuals): Computes root mean squared error for regression (fromcaretpackage).
r
summary(model) predictions <- predict(model, newdata) confusionMatrix(predictions, actuals) # classification RMSE(predictions, actuals) # regression
Example
This example shows how to train a logistic regression model on the built-in iris dataset and evaluate it using a confusion matrix.
r
library(caret) # Prepare data: binary classification (setosa vs others) iris$IsSetosa <- ifelse(iris$Species == "setosa", "Yes", "No") iris$IsSetosa <- as.factor(iris$IsSetosa) # Split data into training and testing set.seed(123) trainIndex <- createDataPartition(iris$IsSetosa, p = 0.7, list = FALSE) trainData <- iris[trainIndex, ] testData <- iris[-trainIndex, ] # Train logistic regression model model <- train(IsSetosa ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, data = trainData, method = "glm", family = "binomial") # Predict on test data predictions <- predict(model, testData) # Evaluate with confusion matrix confMat <- confusionMatrix(predictions, testData$IsSetosa) print(confMat)
Output
Confusion Matrix and Statistics
Reference
Prediction No Yes
No 39 0
Yes 1 31
Accuracy : 0.9857
95% CI : (0.9147, 0.9994)
No Information Rate : 0.5429
P-Value [Acc > NIR] : 1.02e-11
Kappa : 0.9714
Mcnemar's Test P-Value : 1
Sensitivity : 0.9744
Specificity : 1.0000
Pos Pred Value : 1.0000
Neg Pred Value : 0.9750
Prevalence : 0.4571
Detection Rate : 0.4457
Detection Prevalence : 0.4457
Balanced Accuracy : 0.9872
Common Pitfalls
Common mistakes when evaluating models in R include:
- Using training data for evaluation, which causes overly optimistic results.
- Not converting predicted probabilities to class labels before confusion matrix.
- Ignoring class imbalance which can mislead accuracy interpretation.
- Using wrong metric for the problem type (e.g., accuracy for regression).
Always split data into training and testing sets and choose metrics that fit your model type.
r
## Wrong: Evaluating on training data # predictions <- predict(model, trainData) # confusionMatrix(predictions, trainData$IsSetosa) ## Right: Evaluate on test data # predictions <- predict(model, testData) # confusionMatrix(predictions, testData$IsSetosa)
Quick Reference
| Function | Purpose | Use Case |
|---|---|---|
| summary(model) | Show model details and statistics | Any model type |
| predict(model, newdata) | Generate predictions on new data | Any model type |
| confusionMatrix(predictions, actuals) | Evaluate classification accuracy and metrics | Classification models |
| RMSE(predictions, actuals) | Calculate root mean squared error | Regression models |
| createDataPartition() | Split data into training and testing sets | Data preparation |
Key Takeaways
Always split your data into training and testing sets before evaluation.
Use
predict() to get model predictions on new data.Choose evaluation metrics that match your model type: classification or regression.
Use
confusionMatrix() for classification and RMSE() for regression.Avoid evaluating your model on the same data it was trained on to prevent biased results.