When you deploy a machine learning model through an API, the key metrics to watch are latency and throughput. Latency tells you how fast the model responds to a request, which is important for user experience. Throughput shows how many requests the API can handle per second, which matters for scaling. Besides these, traditional model metrics like accuracy, precision, and recall remain important to ensure the model predictions are good. But for deployment, speed and reliability are just as critical.
API-based deployment in Prompt Engineering / GenAI - Model Metrics & Evaluation
Start learning this pattern below
Jump into concepts and practice - no test required
or
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - API-based deployment
Which metric matters for API-based deployment and WHY
Confusion matrix example for model quality
| Predicted Positive | Predicted Negative |
|--------------------|--------------------|
| True Positive (TP) | False Positive (FP) |
| False Negative (FN) | True Negative (TN) |
Example:
TP = 80, FP = 20, FN = 10, TN = 90
Total samples = 200
From this, you calculate:
- Precision = TP / (TP + FP) = 80 / (80 + 20) = 0.8
- Recall = TP / (TP + FN) = 80 / (80 + 10) = 0.89
- Accuracy = (TP + TN) / Total = (80 + 90) / 200 = 0.85
Precision vs Recall tradeoff with API deployment examples
Imagine your API is for spam detection:
- High precision means the API rarely marks good emails as spam. This avoids annoying users.
- High recall means the API catches most spam emails, even if some good emails get flagged.
For API deployment, if your API is slow but very precise, users may get frustrated waiting. If it is fast but misses many spam emails (low recall), it fails its purpose. So you balance model quality metrics with API speed.
What "good" vs "bad" metric values look like for API-based deployment
Good:
- Latency under 200 milliseconds per request
- Throughput of hundreds of requests per second
- Precision and recall above 0.8 for the main task
- Consistent response times without spikes
Bad:
- Latency over 1 second causing user wait
- Throughput too low to handle peak traffic
- Precision or recall below 0.5, meaning many wrong predictions
- Unstable API causing errors or timeouts
Common pitfalls in metrics for API-based deployment
- Ignoring latency: A model with great accuracy but slow API response frustrates users.
- Data leakage: Training data leaking into test data inflates accuracy but fails in real API use.
- Overfitting: High training accuracy but poor API predictions on new data.
- Not monitoring API errors: Model might be good but API crashes or timeouts ruin experience.
- Using accuracy alone: For imbalanced data, accuracy can be misleading; precision and recall matter more.
Self-check: Your model has 98% accuracy but 12% recall on fraud. Is it good?
No, it is not good for fraud detection. The high accuracy likely comes from many normal cases correctly predicted. But the very low recall means the model misses 88% of fraud cases, which is dangerous. For fraud, catching as many frauds as possible (high recall) is critical, even if some false alarms happen. So this model needs improvement before deployment.
Key Result
In API-based deployment, balancing model quality (precision, recall) with API performance (latency, throughput) is key for success.
Practice
1. What is the main purpose of API-based deployment in AI?
easy
Solution
Step 1: Understand API-based deployment
API-based deployment allows AI models to be accessed remotely as services.Step 2: Identify the main purpose
This means apps can get predictions without running the model themselves, making sharing easy.Final Answer:
To share AI models as easy-to-use services over the internet -> Option BQuick Check:
API deployment = share models online [OK]
Hint: API deployment means sharing models as services online [OK]
Common Mistakes:
- Confusing deployment with training
- Thinking API stores data
- Assuming API is for visualization only
2. Which Python library is commonly used to create a simple API server for deploying AI models?
easy
Solution
Step 1: Recall common Python libraries
NumPy is for math, Pandas for data, Matplotlib for plots, Flask for web servers.Step 2: Identify the API server library
Flask is a lightweight web framework used to create APIs easily.Final Answer:
Flask -> Option CQuick Check:
Flask = simple API server [OK]
Hint: Flask is the go-to for simple Python APIs [OK]
Common Mistakes:
- Choosing data libraries instead of web frameworks
- Confusing Flask with data processing tools
- Thinking Matplotlib creates APIs
3. Given this Flask API code snippet, what will be the output when sending a POST request with JSON {"input": 5}?
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/predict', methods=['POST'])
def predict():
data = request.get_json()
x = data['input']
result = x * 2
return jsonify({'output': result})
if __name__ == '__main__':
app.run()medium
Solution
Step 1: Understand the input and processing
The API receives JSON with key 'input' and value 5, then multiplies it by 2.Step 2: Calculate the output
5 * 2 = 10, so the output JSON will have 'output': 10.Final Answer:
{"output": 10} -> Option AQuick Check:
Input 5 doubled = 10 [OK]
Hint: Multiply input by 2 as per code logic [OK]
Common Mistakes:
- Confusing input and output values
- Assuming output equals input
- Missing JSON key causes error
4. Identify the error in this Flask API code for deploying a model:
from flask import Flask, request, jsonify
app = Flask(__name__)
@app.route('/predict', methods=['POST'])
def predict():
data = request.json()
x = data['input']
result = x + 1
return jsonify({'output': result})
if __name__ == '__main__':
app.run()medium
Solution
Step 1: Check how JSON is accessed in Flask
Flask's request object uses get_json() method, not json() function.Step 2: Identify the error
Using request.json() will cause an AttributeError; correct is request.get_json().Final Answer:
Using request.json() instead of request.get_json() -> Option DQuick Check:
Use get_json() to parse JSON [OK]
Hint: Use request.get_json(), not request.json() [OK]
Common Mistakes:
- Confusing request.json with get_json()
- Changing HTTP method unnecessarily
- Forgetting route decorator
5. You want to deploy a machine learning model via an API that predicts house prices. The model expects a JSON with features like 'size' and 'bedrooms'. Which approach best ensures your API handles missing features gracefully?
hard
Solution
Step 1: Understand missing feature handling
Missing features can cause prediction errors if not handled properly.Step 2: Choose a robust approach
Filling missing features with default or average values allows prediction to continue safely.Final Answer:
Fill missing features with default values before prediction -> Option AQuick Check:
Use defaults for missing inputs [OK]
Hint: Use default values to handle missing features [OK]
Common Mistakes:
- Stopping API on missing data
- Ignoring missing features causing errors
- Restarting server unrelated to missing data
