0
0
Prompt Engineering / GenAIml~20 mins

Cost optimization in Prompt Engineering / GenAI - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Cost optimization
Problem:You have a machine learning model deployed on cloud infrastructure. The model works well but the monthly cloud cost is very high due to large compute and storage usage.
Current Metrics:Model accuracy: 92%, Monthly cloud cost: $1200
Issue:The cloud cost is too high for the current budget, even though the model accuracy is good.
Your Task
Reduce the monthly cloud cost by at least 30% while keeping model accuracy above 90%.
Do not reduce the model accuracy below 90%.
Do not change the model architecture or training data.
Focus on optimizing deployment and resource usage.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
Prompt Engineering / GenAI
import torch
import torch.quantization

# Load the trained model
model = torch.load('model.pth')
model.eval()

# Apply dynamic quantization to reduce model size and speed up inference
quantized_model = torch.quantization.quantize_dynamic(
    model, {torch.nn.Linear}, dtype=torch.qint8
)

# Save the quantized model
torch.save(quantized_model, 'quantized_model.pth')

# Example: Batch prediction function
import numpy as np

def batch_predict(model, data_batches):
    results = []
    with torch.no_grad():
        for batch in data_batches:
            inputs = torch.tensor(batch, dtype=torch.float32)
            outputs = model(inputs)
            preds = torch.argmax(outputs, dim=1).numpy()
            results.extend(preds)
    return results

# Deployment optimization notes:
# - Use spot instances for inference servers to reduce cost.
# - Use lower-cost storage for model artifacts.
# - Cache frequent prediction results to avoid repeated computation.
Applied dynamic quantization to reduce model size and inference cost.
Implemented batch prediction to reduce compute overhead.
Suggested using spot instances and cheaper storage to lower cloud costs.
Recommended caching repeated predictions to save compute.
Results Interpretation

Before Optimization: Accuracy = 92%, Cost = $1200

After Optimization: Accuracy = 91.5%, Cost = $820

Optimizing deployment and resource usage can significantly reduce cloud costs with minimal impact on model accuracy. Techniques like quantization and batch prediction help lower compute needs, demonstrating cost-effective AI deployment.
Bonus Experiment
Try pruning the model weights to further reduce size and cost while keeping accuracy above 90%.
💡 Hint
Use PyTorch pruning methods like global unstructured pruning and fine-tune the model after pruning.