Model optimization helps make AI models smaller and faster without losing much accuracy. This is useful to run models on devices like phones or cameras.
Model optimization (pruning, quantization) in Computer Vision
import tensorflow_model_optimization as tfmot import tensorflow as tf # Pruning example prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude # Quantization example converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] quantized_tflite_model = converter.convert()
Pruning removes less important parts of the model to make it smaller.
Quantization reduces the precision of numbers to make the model faster and smaller.
import tensorflow_model_optimization as tfmot prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude pruning_params = { 'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay( initial_sparsity=0.0, final_sparsity=0.5, begin_step=0, end_step=1000) } model_for_pruning = prune_low_magnitude(model, **pruning_params)
import tensorflow as tf converter = tf.lite.TFLiteConverter.from_keras_model(model) converter.optimizations = [tf.lite.Optimize.DEFAULT] quantized_tflite_model = converter.convert()
This program trains a simple CNN on MNIST digits, applies pruning to remove 50% of weights, then converts the pruned model to a quantized TensorFlow Lite model. It prints the accuracy before and after pruning and the size of the quantized model.
import tensorflow as tf import tensorflow_model_optimization as tfmot from tensorflow.keras import layers, models # Create a simple CNN model for computer vision model = models.Sequential([ layers.InputLayer(input_shape=(28, 28, 1)), layers.Conv2D(16, 3, activation='relu'), layers.MaxPooling2D(), layers.Flatten(), layers.Dense(10, activation='softmax') ]) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # Load MNIST data (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data() x_train = x_train[..., None] / 255.0 x_test = x_test[..., None] / 255.0 # Train the model briefly model.fit(x_train, y_train, epochs=1, batch_size=128, verbose=0) # Apply pruning prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude pruning_params = { 'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay( initial_sparsity=0.0, final_sparsity=0.5, begin_step=0, end_step=1000) } model_for_pruning = prune_low_magnitude(model, **pruning_params) model_for_pruning.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) # Train pruned model briefly model_for_pruning.fit(x_train, y_train, epochs=1, batch_size=128, verbose=0) # Strip pruning wrappers to get final pruned model model_for_export = tfmot.sparsity.keras.strip_pruning(model_for_pruning) # Convert to quantized TFLite model converter = tf.lite.TFLiteConverter.from_keras_model(model_for_export) converter.optimizations = [tf.lite.Optimize.DEFAULT] quantized_tflite_model = converter.convert() # Evaluate original model accuracy loss, acc = model.evaluate(x_test, y_test, verbose=0) print(f'Original model accuracy: {acc:.4f}') # Evaluate pruned model accuracy loss_p, acc_p = model_for_export.evaluate(x_test, y_test, verbose=0) print(f'Pruned model accuracy: {acc_p:.4f}') print(f'Quantized TFLite model size: {len(quantized_tflite_model) / 1024:.2f} KB')
Pruning works best when you retrain the model after pruning to recover accuracy.
Quantization can slightly reduce accuracy but greatly improves speed and size.
Always test your optimized model to ensure it still meets your accuracy needs.
Model optimization makes AI models smaller and faster for real devices.
Pruning removes less important parts of the model to reduce size.
Quantization reduces number precision to speed up the model and save space.