Computer Visionml~8 mins

Model optimization (pruning, quantization) in Computer Vision - Model Metrics & Evaluation

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Metrics & Evaluation - Model optimization (pruning, quantization)

Which metric matters for Model optimization (pruning, quantization) and WHY

When optimizing models by pruning or quantization, the key metrics to watch are accuracy or task-specific performance (like classification accuracy or mean average precision). This is because pruning and quantization reduce model size and speed up inference but can hurt accuracy if done too aggressively. We want to keep accuracy high while making the model smaller and faster.

Additionally, model size (memory footprint) and inference latency (speed) are important metrics to measure the benefits of optimization.

Confusion matrix example after pruning

    Original model confusion matrix:
      TP=90  FP=10
      FN=5   TN=95

    After pruning:
      TP=85  FP=15
      FN=10  TN=90

    Total samples = 200

    Precision before pruning = 90 / (90 + 10) = 0.9
    Recall before pruning = 90 / (90 + 5) = 0.947

    Precision after pruning = 85 / (85 + 15) = 0.85
    Recall after pruning = 85 / (85 + 10) = 0.895

This shows a slight drop in precision and recall after pruning, which is common if pruning is too aggressive.

Tradeoff: Accuracy vs Model Size and Speed

Pruning and quantization reduce model size and speed up predictions but can lower accuracy.

For example, a mobile app needs a small, fast model. It may accept a small accuracy drop to run smoothly on phones.

But a medical image model must keep very high accuracy, so pruning or quantization must be gentle or avoided.

Choosing the right balance depends on the use case: speed and size vs accuracy.

What "good" vs "bad" metric values look like for Model optimization

Good: Accuracy drops less than 1-2% after pruning or quantization, with model size reduced by 50% or more, and inference speed improved significantly.

Bad: Accuracy drops more than 5%, or the model becomes unstable, even if size and speed improve. This means the optimization hurt the model too much.

Common pitfalls in metrics for Model optimization

Ignoring accuracy drop: Only measuring size and speed but missing that accuracy fell too much.
Not testing on real data: Optimizing on training data can hide accuracy loss on new data.
Over-pruning: Removing too many weights causes big accuracy loss.
Quantization errors: Using too low precision can cause unstable predictions.

Self-check question

Your model optimization reduced size by 60% and sped up inference by 3x, but accuracy dropped from 95% to 85%. Is this good for production? Why or why not?

Answer: Usually no. A 10% accuracy drop is large and may hurt user experience or safety. The speed and size gains are good, but the accuracy loss is too high. You should try less aggressive pruning or quantization to keep accuracy higher.

Key Result

Model optimization aims to reduce size and speed up inference while keeping accuracy loss minimal (ideally under 2%).

Practice

(1/5)

1. What is the main goal of model pruning in computer vision?

easy

A. To remove less important parts of the model to reduce size

B. To increase the number of layers in the model

C. To add more training data for better accuracy

D. To convert the model to a different programming language

Model optimization (pruning, quantization) in Computer Vision - Model Metrics & Evaluation

Start learning this pattern below

Practice

Solution

Step 1: Understand pruning concept

Step 2: Identify pruning goal

Final Answer:

Quick Check:

Solution

Step 1: Identify quantization syntax

Step 2: Check other options

Final Answer:

Quick Check:

Solution

Step 1: Calculate total weights

Step 2: Calculate remaining weights after pruning

Step 3: Understand pruning method

Step 4: Check print output

Final Answer:

Quick Check:

Solution

Step 1: Understand the error

Step 2: Identify cause

Final Answer:

Quick Check:

Solution

Step 1: Understand device constraints

Step 2: Choose optimization techniques

Step 3: Combine pruning and quantization

Final Answer:

Quick Check: