Bird
Raised Fist0
Computer Visionml~5 mins

Model evaluation best practices in Computer Vision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
We check how well a model works to trust its decisions and improve it if needed.
After training a model to see if it recognizes images correctly.
Before using a model in a real app to avoid mistakes.
When comparing two models to pick the better one.
To find out if the model is overfitting or underfitting.
To measure progress during model improvements.
Syntax
Computer Vision
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Example: Calculate accuracy
accuracy = accuracy_score(true_labels, predicted_labels)

# Other metrics
precision = precision_score(true_labels, predicted_labels, average='weighted')
recall = recall_score(true_labels, predicted_labels, average='weighted')
f1 = f1_score(true_labels, predicted_labels, average='weighted')
Use the right metric for your task; accuracy is common but not always best.
For multi-class problems, use 'average' parameter to combine scores.
Examples
Calculates accuracy for a small example of true and predicted labels.
Computer Vision
accuracy = accuracy_score([0,1,1,0], [0,1,0,0])
print(f"Accuracy: {accuracy}")
Calculates weighted precision for multi-class predictions.
Computer Vision
precision = precision_score([0,1,1,0], [0,1,0,0], average='weighted')
print(f"Precision: {precision}")
Calculates recall and F1 score to understand model balance between precision and recall.
Computer Vision
recall = recall_score([0,1,1,0], [0,1,0,0], average='weighted')
f1 = f1_score([0,1,1,0], [0,1,0,0], average='weighted')
print(f"Recall: {recall}, F1 Score: {f1}")
Sample Model
This program trains a Random Forest model on digit images and evaluates it using common metrics to check how well it predicts.
Computer Vision
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load sample image data (digits)
digits = load_digits()
X = digits.images.reshape((len(digits.images), -1))
y = digits.target

# Split data into train and test
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a simple model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Predict on test data
predictions = model.predict(X_test)

# Evaluate model
accuracy = accuracy_score(y_test, predictions)
precision = precision_score(y_test, predictions, average='weighted')
recall = recall_score(y_test, predictions, average='weighted')
f1 = f1_score(y_test, predictions, average='weighted')

print(f"Accuracy: {accuracy:.3f}")
print(f"Precision: {precision:.3f}")
print(f"Recall: {recall:.3f}")
print(f"F1 Score: {f1:.3f}")
OutputSuccess
Important Notes
Always split your data into training and testing sets to get honest evaluation.
Use multiple metrics to get a full picture of model performance.
Beware of imbalanced data; accuracy alone can be misleading.
Summary
Model evaluation tells us how good our model is at its task.
Use train/test split and multiple metrics like accuracy, precision, recall, and F1 score.
Good evaluation helps improve models and avoid mistakes in real use.

Practice

(1/5)
1. Why is it important to use a separate test set when evaluating a computer vision model?
easy
A. To check how well the model performs on new, unseen data
B. To make the training process faster
C. To increase the size of the training data
D. To reduce the number of model parameters

Solution

  1. Step 1: Understand the purpose of a test set

    The test set is data the model has never seen before, used to check real-world performance.
  2. Step 2: Compare test set role with other options

    Options B, C, and D do not relate to evaluation but to training or model design.
  3. Final Answer:

    To check how well the model performs on new, unseen data -> Option A
  4. Quick Check:

    Test set = unseen data check [OK]
Hint: Test set = new data to check model accuracy [OK]
Common Mistakes:
  • Confusing test set with training set
  • Thinking test set speeds up training
  • Believing test set changes model size
2. Which of the following is the correct way to split data for model evaluation in Python using scikit-learn?
easy
A. split_train_test(data, 0.2)
B. train_test_split(data, test_size=0.2, random_state=42)
C. train_test(data, 0.2)
D. test_train_split(data, 0.2)

Solution

  1. Step 1: Recall the correct function name in scikit-learn

    The function to split data is called train_test_split with parameters like test_size and random_state.
  2. Step 2: Check the options for correct syntax

    Only train_test_split(data, test_size=0.2, random_state=42) uses the correct function name and parameters; others are invalid or do not exist.
  3. Final Answer:

    train_test_split(data, test_size=0.2, random_state=42) -> Option B
  4. Quick Check:

    Correct function = train_test_split [OK]
Hint: Remember scikit-learn function: train_test_split [OK]
Common Mistakes:
  • Using wrong function names
  • Missing required parameters
  • Confusing order of train and test
3. Given the following code snippet, what will be the printed accuracy?
from sklearn.metrics import accuracy_score
true_labels = [1, 0, 1, 1, 0]
pred_labels = [1, 0, 0, 1, 0]
accuracy = accuracy_score(true_labels, pred_labels)
print("{:.2f}".format(round(accuracy, 2)))
medium
A. 0.60
B. 0.40
C. 0.80
D. 1.00

Solution

  1. Step 1: Compare true and predicted labels

    True: [1, 0, 1, 1, 0], Predicted: [1, 0, 0, 1, 0]. Matches at positions 0,1,3,4 (4 correct out of 5).
  2. Step 2: Calculate accuracy

    Accuracy = correct predictions / total = 4/5 = 0.8. Rounded to 2 decimals is 0.80.
  3. Final Answer:

    0.80 -> Option C
  4. Quick Check:

    Accuracy = 4/5 = 0.80 [OK]
Hint: Count matches, divide by total labels [OK]
Common Mistakes:
  • Counting wrong matches
  • Not rounding accuracy
  • Confusing accuracy with precision
4. You trained a model but the test accuracy is much higher than the training accuracy. What is the most likely issue?
medium
A. Data leakage between training and test sets
B. Model is underfitting the training data
C. Test set is too small
D. Training data is too large

Solution

  1. Step 1: Understand unusual accuracy pattern

    Test accuracy higher than training is unusual and often means test data was seen during training.
  2. Step 2: Identify cause from options

    Data leakage means test data accidentally used in training, causing inflated test accuracy.
  3. Final Answer:

    Data leakage between training and test sets -> Option A
  4. Quick Check:

    High test accuracy > training = data leakage [OK]
Hint: High test accuracy than train? Check data leakage [OK]
Common Mistakes:
  • Assuming underfitting causes higher test accuracy
  • Ignoring data leakage possibility
  • Blaming test set size without evidence
5. You want to evaluate a computer vision model for detecting rare objects in images. Which evaluation metric is best to use and why?
hard
A. Confusion matrix, because it shows training time
B. Accuracy, because it shows overall correct predictions
C. Mean Squared Error, because it measures prediction error
D. F1 score, because it balances precision and recall for imbalanced data

Solution

  1. Step 1: Understand the problem of rare object detection

    Rare objects mean data is imbalanced; many negatives, few positives.
  2. Step 2: Choose metric suitable for imbalanced data

    F1 score balances precision (correct positive predictions) and recall (finding all positives), ideal for rare classes.
  3. Final Answer:

    F1 score, because it balances precision and recall for imbalanced data -> Option D
  4. Quick Check:

    Rare class? Use F1 score [OK]
Hint: Rare classes? Use F1 score for balance [OK]
Common Mistakes:
  • Using accuracy which hides imbalance
  • Confusing regression metrics with classification
  • Misunderstanding confusion matrix purpose