Bird
Raised Fist0
MLOpsdevops~5 mins

Batch prediction vs real-time serving in MLOps - CLI Comparison

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Introduction
When you have a machine learning model, you can use it to make predictions in two main ways. Batch prediction processes many data points at once, while real-time serving answers one request immediately. Choosing the right way helps your app work well and fast.
When you want to analyze a large set of customer data overnight to find trends.
When your app needs to recommend a product instantly when a user visits a page.
When you have limited computing resources and want to run predictions in bulk to save cost.
When you need to respond quickly to user inputs, like fraud detection during a transaction.
When you want to update predictions regularly but not instantly, like daily sales forecasts.
Commands
This command runs batch prediction using MLflow. It takes many inputs from a CSV file and writes predictions to another CSV file. Use this when you want to process data in bulk.
Terminal
mlflow models predict -m models:/my-model/Production -i batch_input.csv -o batch_output.csv
Expected OutputExpected
Prediction completed successfully. Output saved to batch_output.csv
-m - Specifies the model to use
-i - Input file with data to predict
-o - Output file to save predictions
This command starts a real-time prediction server on port 1234. It waits for single prediction requests and responds immediately. Use this when your app needs instant answers.
Terminal
mlflow models serve -m models:/my-model/Production -p 1234
Expected OutputExpected
2024/06/01 12:00:00 Starting MLflow model server for model 'my-model' on port 1234
-m - Specifies the model to serve
-p - Port number for the server
This command sends a single data point to the real-time server for prediction. It shows how your app can ask for one prediction at a time.
Terminal
curl -X POST -H 'Content-Type:application/json' -d '{"data": [[5.1, 3.5, 1.4, 0.2]]}' http://localhost:1234/invocations
Expected OutputExpected
{"predictions": [0]}
Key Concept

Batch prediction processes many data points at once for efficiency, while real-time serving answers single requests instantly for responsiveness.

Common Mistakes
Trying to use batch prediction commands for real-time needs.
Batch prediction waits for all data before responding, causing delays in real-time apps.
Use a real-time serving command or API to get instant predictions.
Starting a real-time server but not sending requests in the correct format.
The server expects JSON data; wrong formats cause errors or no response.
Send prediction requests as JSON with the correct data structure.
Summary
Batch prediction runs many inputs at once and saves results to a file.
Real-time serving starts a server that answers one prediction request at a time.
Use batch for large offline jobs and real-time for instant user interactions.

Practice

(1/5)
1. What is the main difference between batch prediction and real-time serving in machine learning?
easy
A. Batch prediction is faster than real-time serving for single inputs.
B. Real-time serving is used only for training models.
C. Batch prediction processes many inputs at once, while real-time serving processes one input at a time.
D. Batch prediction requires internet connection, real-time serving does not.

Solution

  1. Step 1: Understand batch prediction

    Batch prediction processes a large number of inputs together, usually offline or in scheduled jobs.
  2. Step 2: Understand real-time serving

    Real-time serving handles one input at a time to provide instant predictions.
  3. Final Answer:

    Batch prediction processes many inputs at once, while real-time serving processes one input at a time. -> Option C
  4. Quick Check:

    Batch = many inputs, Real-time = one input [OK]
Hint: Batch = many inputs; real-time = one input fast [OK]
Common Mistakes:
  • Confusing batch with real-time speed
  • Thinking real-time is for training
  • Assuming batch needs internet
2. Which of the following is the correct way to describe real-time serving in a sentence?
easy
A. Real-time serving provides predictions instantly for each individual input.
B. Real-time serving delays predictions until batch processing is complete.
C. Real-time serving is only used for model training.
D. Real-time serving processes data in large groups at scheduled times.

Solution

  1. Step 1: Identify real-time serving purpose

    Real-time serving is designed to give instant predictions for each input as it arrives.
  2. Step 2: Eliminate incorrect options

    Options A, B, and C describe batch or training, not real-time serving.
  3. Final Answer:

    Real-time serving provides predictions instantly for each individual input. -> Option A
  4. Quick Check:

    Instant prediction per input = real-time serving [OK]
Hint: Real-time = instant single input prediction [OK]
Common Mistakes:
  • Mixing batch processing with real-time
  • Thinking real-time is for training
  • Confusing delay with instant response
3. Consider this Python pseudocode for batch prediction and real-time serving:
def batch_predict(data_list):
    return [model.predict(x) for x in data_list]

def real_time_predict(single_input):
    return model.predict(single_input)

batch_result = batch_predict([1, 2, 3])
real_time_result = real_time_predict(4)
print(batch_result, real_time_result)
What will be printed?
medium
A. pred1 pred2 pred3 pred4
B. [pred1, pred2, pred3] pred4
C. [pred1, pred2, pred3, pred4] None
D. Error because batch_predict expects a single input

Solution

  1. Step 1: Understand batch_predict output

    batch_predict returns a list of predictions for each input in data_list, so batch_result is a list [pred1, pred2, pred3].
  2. Step 2: Understand real_time_predict output

    real_time_predict returns a single prediction for the single input 4, so real_time_result is pred4.
  3. Final Answer:

    [pred1, pred2, pred3] pred4 -> Option B
  4. Quick Check:

    Batch returns list, real-time returns single prediction [OK]
Hint: Batch returns list; real-time returns single value [OK]
Common Mistakes:
  • Thinking batch returns single prediction
  • Confusing print output format
  • Assuming error due to input type
4. You have this code snippet for real-time serving:
def real_time_predict(input):
    predictions = []
    for x in input:
        predictions.append(model.predict(x))
    return predictions

result = real_time_predict(5)
print(result)
What is the error and how to fix it?
medium
A. Error: input is not iterable; fix by passing a list like [5].
B. Error: model.predict is undefined; fix by importing model.
C. No error; code runs correctly.
D. Error: predictions list is not returned; fix by adding return statement.

Solution

  1. Step 1: Identify input type issue

    The function expects input to be iterable (like a list), but 5 is an integer and not iterable.
  2. Step 2: Fix by passing iterable

    Passing [5] (a list with one element) makes the loop work correctly.
  3. Final Answer:

    Error: input is not iterable; fix by passing a list like [5]. -> Option A
  4. Quick Check:

    Non-iterable input causes error [OK]
Hint: Check if input is iterable for loops [OK]
Common Mistakes:
  • Passing single value instead of list
  • Ignoring error message about iteration
  • Assuming model.predict missing
5. A company wants to predict customer churn. They have 1 million customers and want to update predictions once a day. They also want to offer instant offers to customers calling support. Which approach fits best?
hard
A. Use batch prediction for support calls and real-time serving for daily updates.
B. Use only real-time serving for all predictions to keep data fresh.
C. Use batch prediction only and ignore real-time serving.
D. Use batch prediction once a day for all customers, and real-time serving for support calls.

Solution

  1. Step 1: Analyze batch prediction use case

    Predicting churn for 1 million customers once a day fits batch prediction well because it handles large data offline.
  2. Step 2: Analyze real-time serving use case

    Instant offers during support calls require quick predictions, so real-time serving is best.
  3. Final Answer:

    Use batch prediction once a day for all customers, and real-time serving for support calls. -> Option D
  4. Quick Check:

    Batch for bulk daily, real-time for instant [OK]
Hint: Batch for bulk jobs; real-time for instant needs [OK]
Common Mistakes:
  • Using real-time for all large data
  • Ignoring instant prediction needs
  • Mixing batch and real-time roles