Ml-pythonHow-ToBeginner · 3 min read

How to Use BentoML: Simple Guide to Model Serving

Use BentoML to save your machine learning model with bentoml.sklearn.save_model(), create a service with bentoml.Service, and serve it via REST API or CLI. This lets you deploy models quickly without complex setup.

📐

Syntax

BentoML uses simple commands to save models, create services, and run servers.

bentoml.sklearn.save_model(model_name, model_object): Save your trained model.
bentoml.Service(name): Create a service to wrap your model.
@svc.api(input, output): Define API endpoints for prediction.
svc.run() or bentoml serve svc:latest: Run the service locally.

python

import bentoml
from bentoml.io import JSON

# Save a model
bentoml.sklearn.save_model('my_model', model)

# Create a service
svc = bentoml.Service('my_service')

# Define API
@svc.api(input=JSON(), output=JSON())
def predict(input_data):
    runner = bentoml.sklearn.get('my_model:latest').to_runner()
    return runner.predict(input_data)

💻

Example

This example shows how to save a simple scikit-learn model, create a BentoML service, and run predictions via API.

python

import bentoml
from bentoml.io import JSON
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import numpy as np

# Train a simple model
iris = load_iris()
X, y = iris.data, iris.target
model = RandomForestClassifier()
model.fit(X, y)

# Save the model
bentoml.sklearn.save_model('iris_rf', model)

# Create a BentoML service
svc = bentoml.Service('iris_classifier')

@svc.api(input=JSON(), output=JSON())
def classify(input_data):
    runner = bentoml.sklearn.get('iris_rf:latest').to_runner()
    prediction = runner.predict(np.array(input_data))
    return prediction.tolist()

if __name__ == '__main__':
    svc.run()

Output

INFO: Started server process [PID] INFO: Waiting for application startup. INFO: Application startup complete. # You can now send POST requests with JSON data to http://localhost:3000/classify

⚠️

Common Pitfalls

Common mistakes when using BentoML include:

Not saving the model before creating the service, causing errors when loading.
Forgetting to use to_runner() to run the model inside the API function.
Not matching input/output types in the API decorator, leading to data format errors.
Running the service without installing BentoML or dependencies.

python

import bentoml
from bentoml.io import JSON

# Wrong: Using model object directly without saving
model = ... # trained model

svc = bentoml.Service('wrong_service')

@svc.api(input=JSON(), output=JSON())
def predict(data):
    # This will fail because model is not saved and runner not used
    return model.predict(data)

# Correct way:
# Save model first
# bentoml.sklearn.save_model('model_name', model)
# Use runner inside API
# runner = bentoml.sklearn.get('model_name:latest').to_runner()
# return runner.predict(data)

📊

Quick Reference

Here is a quick summary of key BentoML commands:

Command	Purpose
bentoml.sklearn.save_model('name', model)	Save a scikit-learn model
bentoml.Service('service_name')	Create a BentoML service
@svc.api(input=..., output=...)	Define API endpoint with input/output types
svc.run()	Run the service locally
bentoml serve svc:latest	Serve the saved BentoML service via CLI

✅

Key Takeaways

Always save your trained model with BentoML before creating a service.

Use BentoML Service and API decorators to define model prediction endpoints.

Run the service locally with svc.run() or via CLI with bentoml serve.

Match input and output types in API to avoid data errors.

Use model runners inside API functions for efficient inference.