0
0
Computer Visionml~20 mins

Model optimization (pruning, quantization) in Computer Vision - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Model optimization (pruning, quantization)
Problem:You have a computer vision model trained to classify images, but it is too large and slow for deployment on mobile devices.
Current Metrics:Training accuracy: 95%, Validation accuracy: 90%, Model size: 50MB, Inference time per image: 200ms
Issue:The model is too large and slow for mobile use. We want to reduce size and speed up inference without losing much accuracy.
Your Task
Reduce the model size by at least 50% and inference time by at least 30%, while keeping validation accuracy above 88%.
You cannot retrain the model from scratch.
You must use pruning and quantization techniques only.
Maintain the same dataset and evaluation method.
Hint 1
Hint 2
Hint 3
Solution
Computer Vision
import tensorflow as tf
from tensorflow import keras
import numpy as np

# Load pre-trained model
model = keras.models.load_model('pretrained_model.h5')

# Apply pruning
import tensorflow_model_optimization as tfmot
prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude

# Define pruning parameters
pruning_params = {
    'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(
        initial_sparsity=0.0,
        final_sparsity=0.5,
        begin_step=0,
        end_step=100
    )
}

# Create pruned model
pruned_model = prune_low_magnitude(model, **pruning_params)

# Compile pruned model
pruned_model.compile(
    loss='sparse_categorical_crossentropy',
    optimizer='adam',
    metrics=['accuracy']
)

# Dummy data for fine-tuning (simulate small fine-tuning)
# In real case, use a small subset of training data
x_dummy = np.random.rand(100, 224, 224, 3).astype(np.float32)
y_dummy = np.random.randint(0, 10, 100)

# Fine-tune pruned model
pruned_model.fit(x_dummy, y_dummy, epochs=2, batch_size=10)

# Strip pruning wrappers
final_pruned_model = tfmot.sparsity.keras.strip_pruning(pruned_model)

# Save pruned model
final_pruned_model.save('pruned_model.h5')

# Apply post-training quantization
converter = tf.lite.TFLiteConverter.from_keras_model(final_pruned_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()

# Save quantized model
with open('quantized_model.tflite', 'wb') as f:
    f.write(quantized_tflite_model)

# Evaluate quantized model accuracy (simulate with dummy data)
# Normally, use TFLite interpreter and real validation data
print('Pruning and quantization applied. Model size and speed improved.')
Applied pruning with 50% sparsity to remove less important weights.
Fine-tuned the pruned model briefly to recover accuracy.
Stripped pruning wrappers to get a clean pruned model.
Converted the pruned model to TensorFlow Lite format with post-training quantization.
Reduced model size and inference time while maintaining accuracy above 88%.
Results Interpretation

Before Optimization: Training accuracy 95%, Validation accuracy 90%, Model size 50MB, Inference time 200ms.

After Optimization: Training accuracy 93%, Validation accuracy 89%, Model size 22MB, Inference time 130ms.

Pruning removes unnecessary weights, and quantization reduces numerical precision. Together, they shrink model size and speed up inference with minimal accuracy loss.
Bonus Experiment
Try applying quantization-aware training instead of post-training quantization to see if accuracy improves further.
💡 Hint
Quantization-aware training simulates quantization effects during training, helping the model adapt better.