0
0
MLOpsdevops~5 mins

Batch prediction vs real-time serving in MLOps - CLI Comparison

Choose your learning style9 modes available
Introduction
When you have a machine learning model, you can use it to make predictions in two main ways. Batch prediction processes many data points at once, while real-time serving answers one request immediately. Choosing the right way helps your app work well and fast.
When you want to analyze a large set of customer data overnight to find trends.
When your app needs to recommend a product instantly when a user visits a page.
When you have limited computing resources and want to run predictions in bulk to save cost.
When you need to respond quickly to user inputs, like fraud detection during a transaction.
When you want to update predictions regularly but not instantly, like daily sales forecasts.
Commands
This command runs batch prediction using MLflow. It takes many inputs from a CSV file and writes predictions to another CSV file. Use this when you want to process data in bulk.
Terminal
mlflow models predict -m models:/my-model/Production -i batch_input.csv -o batch_output.csv
Expected OutputExpected
Prediction completed successfully. Output saved to batch_output.csv
-m - Specifies the model to use
-i - Input file with data to predict
-o - Output file to save predictions
This command starts a real-time prediction server on port 1234. It waits for single prediction requests and responds immediately. Use this when your app needs instant answers.
Terminal
mlflow models serve -m models:/my-model/Production -p 1234
Expected OutputExpected
2024/06/01 12:00:00 Starting MLflow model server for model 'my-model' on port 1234
-m - Specifies the model to serve
-p - Port number for the server
This command sends a single data point to the real-time server for prediction. It shows how your app can ask for one prediction at a time.
Terminal
curl -X POST -H 'Content-Type:application/json' -d '{"data": [[5.1, 3.5, 1.4, 0.2]]}' http://localhost:1234/invocations
Expected OutputExpected
{"predictions": [0]}
Key Concept

Batch prediction processes many data points at once for efficiency, while real-time serving answers single requests instantly for responsiveness.

Common Mistakes
Trying to use batch prediction commands for real-time needs.
Batch prediction waits for all data before responding, causing delays in real-time apps.
Use a real-time serving command or API to get instant predictions.
Starting a real-time server but not sending requests in the correct format.
The server expects JSON data; wrong formats cause errors or no response.
Send prediction requests as JSON with the correct data structure.
Summary
Batch prediction runs many inputs at once and saves results to a file.
Real-time serving starts a server that answers one prediction request at a time.
Use batch for large offline jobs and real-time for instant user interactions.