How to Use BentoML: Simple Guide to Model Serving
Use
BentoML to save your machine learning model with bentoml.sklearn.save_model(), create a service with bentoml.Service, and serve it via REST API or CLI. This lets you deploy models quickly without complex setup.Syntax
BentoML uses simple commands to save models, create services, and run servers.
bentoml.sklearn.save_model(model_name, model_object): Save your trained model.bentoml.Service(name): Create a service to wrap your model.@svc.api(input, output): Define API endpoints for prediction.svc.run()orbentoml serve svc:latest: Run the service locally.
python
import bentoml from bentoml.io import JSON # Save a model bentoml.sklearn.save_model('my_model', model) # Create a service svc = bentoml.Service('my_service') # Define API @svc.api(input=JSON(), output=JSON()) def predict(input_data): runner = bentoml.sklearn.get('my_model:latest').to_runner() return runner.predict(input_data)
Example
This example shows how to save a simple scikit-learn model, create a BentoML service, and run predictions via API.
python
import bentoml from bentoml.io import JSON from sklearn.datasets import load_iris from sklearn.ensemble import RandomForestClassifier import numpy as np # Train a simple model iris = load_iris() X, y = iris.data, iris.target model = RandomForestClassifier() model.fit(X, y) # Save the model bentoml.sklearn.save_model('iris_rf', model) # Create a BentoML service svc = bentoml.Service('iris_classifier') @svc.api(input=JSON(), output=JSON()) def classify(input_data): runner = bentoml.sklearn.get('iris_rf:latest').to_runner() prediction = runner.predict(np.array(input_data)) return prediction.tolist() if __name__ == '__main__': svc.run()
Output
INFO: Started server process [PID]
INFO: Waiting for application startup.
INFO: Application startup complete.
# You can now send POST requests with JSON data to http://localhost:3000/classify
Common Pitfalls
Common mistakes when using BentoML include:
- Not saving the model before creating the service, causing errors when loading.
- Forgetting to use
to_runner()to run the model inside the API function. - Not matching input/output types in the API decorator, leading to data format errors.
- Running the service without installing BentoML or dependencies.
python
import bentoml from bentoml.io import JSON # Wrong: Using model object directly without saving model = ... # trained model svc = bentoml.Service('wrong_service') @svc.api(input=JSON(), output=JSON()) def predict(data): # This will fail because model is not saved and runner not used return model.predict(data) # Correct way: # Save model first # bentoml.sklearn.save_model('model_name', model) # Use runner inside API # runner = bentoml.sklearn.get('model_name:latest').to_runner() # return runner.predict(data)
Quick Reference
Here is a quick summary of key BentoML commands:
| Command | Purpose |
|---|---|
| bentoml.sklearn.save_model('name', model) | Save a scikit-learn model |
| bentoml.Service('service_name') | Create a BentoML service |
| @svc.api(input=..., output=...) | Define API endpoint with input/output types |
| svc.run() | Run the service locally |
| bentoml serve svc:latest | Serve the saved BentoML service via CLI |
Key Takeaways
Always save your trained model with BentoML before creating a service.
Use BentoML Service and API decorators to define model prediction endpoints.
Run the service locally with svc.run() or via CLI with bentoml serve.
Match input and output types in API to avoid data errors.
Use model runners inside API functions for efficient inference.