PyTorchml~8 mins

Model optimization (quantization, pruning) in PyTorch - Model Metrics & Evaluation

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Metrics & Evaluation - Model optimization (quantization, pruning)

Which metric matters for Model optimization (quantization, pruning) and WHY

When optimizing models by quantization or pruning, the key metrics to watch are model accuracy and inference latency. Accuracy tells us if the model still makes good predictions after optimization. Latency shows how fast the model runs, which is often the goal of optimization. We also check model size to see how much memory is saved. Balancing these metrics helps us keep the model useful while making it smaller and faster.

Confusion matrix example after pruning

    Original model confusion matrix:
      TP=90  FP=10
      FN=5   TN=95

    After pruning:
      TP=85  FP=15
      FN=10  TN=90

    Total samples = 200

    Precision before pruning = 90 / (90 + 10) = 0.9
    Recall before pruning = 90 / (90 + 5) = 0.947

    Precision after pruning = 85 / (85 + 15) = 0.85
    Recall after pruning = 85 / (85 + 10) = 0.895

This shows a slight drop in precision and recall after pruning, which is common but should be minimal.

Tradeoff: Accuracy vs Model Size and Speed

Quantization and pruning reduce model size and speed up inference but can lower accuracy. For example:

Quantization: Converts weights from 32-bit floats to 8-bit integers. This shrinks model size and speeds up calculations but may cause small accuracy loss.
Pruning: Removes less important connections. This reduces size and computation but can remove useful information, lowering accuracy.

We must decide how much accuracy loss is acceptable for the gain in speed and size. For mobile apps, smaller and faster models are often worth a small accuracy drop.

Good vs Bad metric values for Model optimization

Good:

Accuracy drop < 1-2% after optimization
Model size reduced by 50% or more
Inference latency reduced by 30% or more

Bad:

Accuracy drops more than 5%
Minimal size or speed improvement
Model becomes unstable or unpredictable

Common pitfalls in Model optimization metrics

Ignoring accuracy drop: Focusing only on size/speed can break the model.
Data leakage: Testing on data seen during training can hide accuracy loss.
Overfitting to optimization: Tweaking too much on test data can give false confidence.
Not measuring latency on target device: Speed gains on desktop may not appear on mobile.

Self-check question

Your model after pruning has 98% accuracy but recall on the positive class dropped to 12%. Is it good for production? Why or why not?

Answer: No, it is not good. Even though overall accuracy is high, the very low recall means the model misses most positive cases. For example, if detecting fraud, missing 88% of fraud cases is dangerous. High recall is critical in such tasks.

Key Result

Model optimization aims to reduce size and latency while keeping accuracy loss minimal, typically under 2%.