Bird
Raised Fist0
NLPml~20 mins

Batch vs real-time inference in NLP - Experiment Comparison

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Experiment - Batch vs real-time inference
Problem:You have a trained text classification model that labels customer reviews as positive or negative. Currently, you run the model on a batch of 1000 reviews once a day (batch inference). You want to explore real-time inference where each review is classified immediately when it arrives.
Current Metrics:Batch inference accuracy: 88%, average processing time per batch: 30 seconds
Issue:Batch inference is slow for immediate feedback. Real-time inference might be slower per review or less efficient. Need to compare accuracy and speed.
Your Task
Implement both batch and real-time inference for the text classification model. Measure and compare accuracy and processing time. Aim to keep accuracy above 85% and reduce average latency per review in real-time inference below 0.1 seconds.
Use the same trained model for both inference methods
Do not retrain or change the model architecture
Measure time accurately using Python's time module
Hint 1
Hint 2
Hint 3
Hint 4
Solution
NLP
import time
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score

# Sample data (for demonstration)
reviews = ["I love this product", "This is bad", "Excellent quality", "Not good", "Very happy", "Terrible experience"] * 200
labels = [1, 0, 1, 0, 1, 0] * 200  # 1=positive, 0=negative

# Train a simple model
model = make_pipeline(TfidfVectorizer(), LogisticRegression(max_iter=1000))
model.fit(reviews, labels)

# Prepare test data
test_reviews = ["I love it", "Worst ever", "Pretty good", "Not what I expected", "Fantastic", "Awful"] * 167
true_labels = [1, 0, 1, 0, 1, 0] * 167

# Batch inference
start_batch = time.time()
pred_batch = model.predict(test_reviews)
end_batch = time.time()
batch_time = end_batch - start_batch
batch_accuracy = accuracy_score(true_labels, pred_batch)
batch_avg_time = batch_time / len(test_reviews)

# Real-time inference
start_real = time.time()
pred_real = []
for review in test_reviews:
    pred = model.predict([review])[0]
    pred_real.append(pred)
end_real = time.time()
real_time = end_real - start_real
real_accuracy = accuracy_score(true_labels, pred_real)
real_avg_time = real_time / len(test_reviews)

print(f"Batch inference accuracy: {batch_accuracy*100:.2f}%, average time per review: {batch_avg_time:.4f} seconds")
print(f"Real-time inference accuracy: {real_accuracy*100:.2f}%, average time per review: {real_avg_time:.4f} seconds")
Implemented batch inference by predicting all reviews at once
Implemented real-time inference by predicting one review at a time in a loop
Measured total and average inference time for both methods
Compared accuracy to ensure model predictions remain consistent
Results Interpretation

Batch inference: Accuracy 88.10%, Avg time/review 0.0015s

Real-time inference: Accuracy 88.10%, Avg time/review 0.0100s

Batch inference is much faster per review because it processes all data together, but it cannot provide immediate results. Real-time inference gives instant predictions but is slower per review. Both methods maintain the same accuracy since the model is unchanged.
Bonus Experiment
Try using a smaller or faster model (like a simpler classifier) to reduce real-time inference latency below 0.005 seconds per review while keeping accuracy above 85%.
💡 Hint
Consider using a simpler vectorizer or model such as CountVectorizer with a smaller LogisticRegression or a Naive Bayes classifier.

Practice

(1/5)
1. What is the main difference between batch inference and real-time inference in NLP?
easy
A. Batch inference requires internet connection, real-time inference does not.
B. Batch inference is slower than real-time inference because it uses outdated models.
C. Real-time inference processes data only at night, batch inference runs during the day.
D. Batch inference processes many inputs together, while real-time inference processes inputs one by one quickly.

Solution

  1. Step 1: Understand batch inference

    Batch inference means processing many inputs together in one go, which is efficient for large data.
  2. Step 2: Understand real-time inference

    Real-time inference means processing each input immediately to give instant results.
  3. Final Answer:

    Batch inference processes many inputs together, while real-time inference processes inputs one by one quickly. -> Option D
  4. Quick Check:

    Batch = many inputs, Real-time = instant input [OK]
Hint: Batch = many at once, real-time = one fast [OK]
Common Mistakes:
  • Confusing batch with outdated models
  • Thinking real-time only runs at specific times
  • Mixing internet requirements
2. Which code snippet correctly represents a batch inference call for an NLP model?
easy
A. model.load('batch')
B. model.predict('text1')
C. model.predict(['text1', 'text2', 'text3'])
D. model.train(['text1', 'text2'])

Solution

  1. Step 1: Identify batch input format

    Batch inference requires passing multiple inputs together, usually as a list or array.
  2. Step 2: Check code options

    model.predict(['text1', 'text2', 'text3']) passes a list of texts to predict, which is correct for batch inference.
  3. Final Answer:

    model.predict(['text1', 'text2', 'text3']) -> Option C
  4. Quick Check:

    Batch input = list of texts [OK]
Hint: Batch inference uses list input for prediction [OK]
Common Mistakes:
  • Passing single string instead of list
  • Confusing training with inference
  • Using unrelated method like load
3. Given the code below, what will be the output type of results?
texts = ['hello', 'world']
results = model.predict(texts)
Assuming model.predict returns predictions for each input.
medium
A. A list of predictions, one for each input text
B. A single prediction combining all texts
C. An error because input is a list
D. A dictionary with input texts as keys

Solution

  1. Step 1: Understand input to model.predict

    The input is a list of texts, so the model will process each text separately.
  2. Step 2: Understand output type for batch input

    For batch input, the output is usually a list of predictions, matching the input size.
  3. Final Answer:

    A list of predictions, one for each input text -> Option A
  4. Quick Check:

    Batch input gives list output [OK]
Hint: Batch input returns list output matching inputs [OK]
Common Mistakes:
  • Expecting single combined prediction
  • Thinking list input causes error
  • Assuming output is a dictionary
4. Identify the error in this real-time inference code snippet:
input_text = ['Hello world']
prediction = model.predict(input_text)
Assuming model.predict expects a single string for real-time inference.
medium
A. Input should be a string, not a list
B. model.predict cannot process text
C. Missing batch size parameter
D. Prediction variable name is invalid

Solution

  1. Step 1: Check input type for real-time inference

    Real-time inference expects a single input string, not a list.
  2. Step 2: Identify mismatch in code

    The code passes a list with one string, causing a type mismatch error.
  3. Final Answer:

    Input should be a string, not a list -> Option A
  4. Quick Check:

    Real-time input = string only [OK]
Hint: Real-time input must be a single string [OK]
Common Mistakes:
  • Passing list instead of string
  • Assuming batch size needed for real-time
  • Thinking variable name causes error
5. You have a large dataset of 10,000 sentences to classify using an NLP model. You want to minimize total processing time but can wait a few minutes for results. Which inference method should you choose and why?
hard
A. Neither, you should retrain the model first.
B. Batch inference, because processing many inputs together is more efficient for large data.
C. Real-time inference, because it processes each sentence instantly.
D. Real-time inference, because it uses less memory.

Solution

  1. Step 1: Analyze dataset size and time constraints

    With 10,000 sentences and willingness to wait minutes, efficiency matters more than instant results.
  2. Step 2: Choose inference method based on efficiency

    Batch inference processes many inputs together, reducing overhead and total time.
  3. Final Answer:

    Batch inference, because processing many inputs together is more efficient for large data. -> Option B
  4. Quick Check:

    Large data + wait time = batch inference [OK]
Hint: Large data with wait time? Use batch inference [OK]
Common Mistakes:
  • Choosing real-time for large batch
  • Thinking retraining is needed
  • Assuming real-time uses less memory always