Process Flow - Batch prediction vs real-time serving
Start
↓
Input Data
↓
Batch Prediction
↓
Process large data
↓
Store results
↓
End
Shows how input data splits into batch prediction for many data points processed together, and real-time serving for single requests answered immediately.
Execution Sample
MLOps
def batch_predict(data_batch):
results = []
for data in data_batch:
results.append(model.predict(data))
return results
def real_time_serve(single_data):
return model.predict(single_data)
Two functions: one predicts on a batch of data, the other predicts on a single data point instantly.
Process Table
Step
Function
Input
Action
Output/Result
1
batch_predict
[d1, d2, d3]
Start loop over batch
No output yet
2
batch_predict
d1
Predict on d1
result1
3
batch_predict
d2
Predict on d2
result2
4
batch_predict
d3
Predict on d3
result3
5
batch_predict
N/A
Return all results
[result1, result2, result3]
6
real_time_serve
dX
Predict on single data dX
resultX
💡 Batch prediction ends after processing all data in batch; real-time serving ends after single prediction.
Status Tracker
Variable
Start
After 1
After 2
After 3
Final
results
[]
[result1]
[result1, result2]
[result1, result2, result3]
[result1, result2, result3]
data
N/A
d1
d2
d3
N/A
single_data
dX
dX
dX
dX
dX
Key Moments - 3 Insights
Why does batch prediction take longer than real-time serving?
Batch prediction processes many data points one after another (see execution_table steps 1-4), so it takes more time overall. Real-time serving handles only one data point instantly (step 6).
Can real-time serving handle multiple requests at once like batch prediction?
No, real-time serving processes one request at a time to provide immediate results, unlike batch prediction which processes many together (see concept_flow).
Why do batch predictions store results instead of returning immediately?
Batch prediction collects all results before returning to avoid delays for each data point, improving efficiency (see variable_tracker 'results' growing over steps).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the output after step 4 in batch_predict?
A[result1, result2, result3]
B[result1, result2]
Cresult3
DNo output yet
💡 Hint
Check the 'Output/Result' column at step 4 in execution_table.
At which step does real_time_serve produce its output?
AStep 3
BStep 5
CStep 6
DStep 2
💡 Hint
Look for the step with function 'real_time_serve' in execution_table.
If batch_predict processed 5 data points instead of 3, how would 'results' change after step 3?
AIt would have 3 results
BIt would have 5 results
CIt would have 2 results
DIt would be empty
💡 Hint
Variable 'results' grows by one per iteration; after 3 steps it has 3 results regardless of total batch size.
Concept Snapshot
Batch prediction:
- Processes many data points together
- Runs predictions in a loop
- Returns all results at once
Real-time serving:
- Processes one data point instantly
- Returns result immediately
- Used for live user requests
Full Transcript
This visual execution compares batch prediction and real-time serving in machine learning operations. Batch prediction takes a list of data points and predicts on each one in a loop, collecting results before returning them all together. Real-time serving takes a single data point and returns the prediction immediately. The flow diagram shows input splitting into two paths: batch and real-time. The execution table traces each step of batch prediction looping through data points and real-time serving handling one request. Variable tracking shows how the results list grows during batch prediction. Key moments clarify why batch takes longer and why real-time serves single requests. The quiz tests understanding of output timing and variable changes. This helps beginners see the difference in processing style and timing between batch and real-time prediction.