0
0
MLOpsdevops~5 mins

Batch prediction vs real-time serving in MLOps - Performance Comparison

Choose your learning style9 modes available
Time Complexity: Batch prediction vs real-time serving
O(n)
Understanding Time Complexity

We want to understand how the time needed to make predictions changes when using batch prediction versus real-time serving.

How does the number of predictions affect the time taken in each method?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


# Batch prediction example
predictions = []
for data_point in dataset:
    prediction = model.predict(data_point)
    predictions.append(prediction)

# Real-time serving example
# Each request triggers a single prediction
response = model.predict(single_request_data)
    

This code shows batch prediction processing many data points in a loop, and real-time serving handling one request at a time.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Calling model.predict() for each data point.
  • How many times: In batch, once per data point in the dataset; in real-time, once per request.
How Execution Grows With Input

As the number of data points grows, batch prediction time grows proportionally because it predicts all at once.

Real-time serving handles one prediction at a time, so each prediction time stays about the same regardless of total requests.

Input Size (n)Approx. Operations (Batch)Approx. Operations (Real-time)
1010 predictions1 prediction per request
100100 predictions1 prediction per request
10001000 predictions1 prediction per request

Pattern observation: Batch time grows with number of data points; real-time time per prediction stays constant.

Final Time Complexity

Time Complexity: O(n)

This means batch prediction time grows linearly with the number of data points, while real-time serving handles each prediction individually with constant time.

Common Mistake

[X] Wrong: "Real-time serving takes longer as more requests come in because it processes all requests together like batch."

[OK] Correct: Real-time serving processes each request separately, so the time per prediction stays about the same regardless of total requests.

Interview Connect

Understanding how prediction time scales helps you explain trade-offs between batch and real-time systems clearly, a useful skill in many practical MLOps discussions.

Self-Check

What if we parallelize batch prediction to run multiple predictions at the same time? How would the time complexity change?