Model serving is a key step after training an NLP model. What is its main goal?
Think about what happens after a model is ready and you want to use it.
Model serving means making the trained NLP model available to accept new text inputs and return predictions, often in real time.
Given this simple Flask app serving an NLP sentiment model, what is the JSON response when sending POST with text 'I love this!'?
from flask import Flask, request, jsonify app = Flask(__name__) @app.route('/predict', methods=['POST']) def predict(): data = request.json text = data.get('text', '') # Dummy sentiment prediction sentiment = 'positive' if 'love' in text else 'negative' return jsonify({'sentiment': sentiment}) # Assume app.run() is called elsewhere
Check if the word 'love' is in the input text.
The code returns 'positive' sentiment if the word 'love' is found in the input text, otherwise 'negative'.
You want to serve an NLP model that responds quickly to single user requests. Which batch size should you choose?
Think about how batch size affects response time for individual requests.
For low latency serving, batch size 1 is best because it processes each request immediately without waiting to fill a batch.
To monitor an NLP model serving system, which metric directly reflects user experience quality?
Think about what users notice when using a model service.
Users care about how fast the model responds, so average response latency is a key serving metric.
Examine this snippet from a model serving function. Why does it raise a KeyError?
def serve_model(request_json): text = request_json['input_text'] # Model prediction code here return {'prediction': 'positive'} # Called with: serve_model({'text': 'Hello world'})
Check the keys used to access the input dictionary.
The function expects 'input_text' key, but the input dictionary has 'text' key, so accessing 'input_text' raises KeyError.