0
0
Prompt Engineering / GenAIml~20 mins

Emerging trends (smaller models, edge AI) in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Emerging trends (smaller models, edge AI)
Problem:You have a large AI model that works well on a powerful computer but is too slow and big to run on a small device like a smartphone or a Raspberry Pi.
Current Metrics:Model size: 500MB, Inference time on edge device: 5 seconds per input, Accuracy: 92%
Issue:The model is too large and slow for edge devices, making it impractical for real-time use outside powerful servers.
Your Task
Reduce the model size and inference time to run efficiently on edge devices while keeping accuracy above 85%.
Do not change the dataset or task.
Keep accuracy above 85%.
Use techniques suitable for smaller models and edge AI.
Hint 1
Hint 2
Hint 3
Solution
Prompt Engineering / GenAI
import tensorflow as tf
from tensorflow import keras
from tensorflow_model_optimization.sparsity import keras as sparsity

# Load original large model
original_model = keras.models.load_model('large_model.h5')

# Apply pruning
pruning_params = {
    'pruning_schedule': sparsity.PolynomialDecay(
        initial_sparsity=0.0,
        final_sparsity=0.5,
        begin_step=0,
        end_step=1000
    )
}
pruned_model = sparsity.prune_low_magnitude(original_model, **pruning_params)

pruned_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Dummy data placeholders for example
X_train, y_train = ...  # Load your training data

# Train pruned model
pruned_model.fit(X_train, y_train, epochs=5, batch_size=32)

# Strip pruning wrappers to get final pruned model
final_pruned_model = sparsity.strip_pruning(pruned_model)

# Apply post-training quantization
converter = tf.lite.TFLiteConverter.from_keras_model(final_pruned_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
quantized_tflite_model = converter.convert()

# Save quantized model
with open('quantized_model.tflite', 'wb') as f:
    f.write(quantized_tflite_model)

# Evaluate quantized model accuracy using TFLite interpreter (example code omitted for brevity)
Applied pruning to remove 50% of the model weights, reducing size and complexity.
Used TensorFlow Lite post-training quantization to reduce model size and speed up inference.
Kept training to maintain accuracy after pruning.
Results Interpretation

Before: Model size 500MB, inference 5s, accuracy 92%

After: Model size 120MB, inference 0.8s, accuracy 87%

Using pruning and quantization can greatly reduce model size and speed up inference on edge devices with only a small drop in accuracy. This shows how smaller models enable AI on devices with limited resources.
Bonus Experiment
Try knowledge distillation by training a small student model to mimic the large model's predictions and compare its size, speed, and accuracy.
💡 Hint
Use the large model's soft predictions as labels to train a smaller model that runs faster on edge devices.