Which of the following best describes batch inference in machine learning?
Think about when predictions are made: all at once or one by one?
Batch inference means processing many inputs together, often offline, to produce predictions in bulk. Real-time inference is instant per input.
Which scenario is best suited for real-time inference?
Consider when the prediction is needed: instantly or later?
Real-time inference is used when predictions must be made instantly, such as recommending products right after user actions.
Given a model that takes 0.01 seconds to predict one input, what is the expected latency for batch inference processing 1000 inputs at once compared to real-time inference processing one input at a time?
Multiply prediction time by number of inputs for batch; real-time is per input.
Batch inference processes all inputs together, so total time is 0.01s * 1000 = 10s. Real-time inference processes one input at a time, so latency per input is 0.01s.
What error will this Python code raise when performing real-time inference?
def predict_real_time(model, inputs):
results = []
for input in inputs:
prediction = model.predict(input)
results.append(prediction)
return results
# inputs is a list of data pointsCheck where the prediction is added to results inside the loop.
The append is outside the loop, so only the last prediction is added. The list misses all previous predictions.
You have two model architectures: Model A is large and accurate but slow; Model B is smaller and faster but less accurate. For a chatbot requiring instant replies, which model is best?
Consider the trade-off between speed and accuracy for instant replies.
Real-time inference requires fast predictions to keep conversations smooth, so a smaller, faster model is preferred despite some accuracy loss.