How to Serve ML Model: Simple Steps to Deploy Your Model
To serve an ML model, you typically create a web API using frameworks like
Flask or FastAPI that loads the trained model and responds to prediction requests. This allows other applications to send data and receive predictions in real time.Syntax
Serving an ML model usually involves these steps:
- Load the trained model: Use libraries like
jobliborpickleto load your saved model. - Create a web server: Use a web framework like
Flaskto handle incoming requests. - Define an API endpoint: Create a route (e.g.,
/predict) that accepts input data. - Process input and predict: Convert input data to the right format and call the model's
predictmethod. - Return the prediction: Send the prediction back as a JSON response.
python
from flask import Flask, request, jsonify import joblib app = Flask(__name__) # Load the trained model model = joblib.load('model.joblib') @app.route('/predict', methods=['POST']) def predict(): data = request.get_json(force=True) features = data['features'] prediction = model.predict([features]) return jsonify({'prediction': prediction[0].item()}) if __name__ == '__main__': app.run(debug=True)
Example
This example shows how to serve a simple scikit-learn model using Flask. It loads a saved model, accepts JSON input with features, and returns the prediction.
python
from flask import Flask, request, jsonify import joblib import numpy as np app = Flask(__name__) # For demonstration, create and save a simple model from sklearn.linear_model import LogisticRegression from sklearn.datasets import load_iris iris = load_iris() X, y = iris.data, iris.target model = LogisticRegression(max_iter=200) model.fit(X, y) joblib.dump(model, 'model.joblib') # Load the model model = joblib.load('model.joblib') @app.route('/predict', methods=['POST']) def predict(): data = request.get_json(force=True) features = data['features'] # expects a list of numbers features_array = np.array(features).reshape(1, -1) prediction = model.predict(features_array) return jsonify({'prediction': int(prediction[0])}) if __name__ == '__main__': app.run(debug=True)
Output
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
127.0.0.1 - - [Date Time] "POST /predict HTTP/1.1" 200 -
Common Pitfalls
- Not matching input format: The input JSON must match the expected feature format exactly.
- Model file missing or incompatible: Ensure the model file is saved and loaded correctly with matching library versions.
- Not handling errors: Add error handling for bad inputs or prediction failures.
- Running in debug mode in production: Debug mode is for development only and should be off in production.
python
from flask import Flask, request, jsonify import joblib app = Flask(__name__) # Wrong: Not loading model before prediction @app.route('/predict', methods=['POST']) def predict_wrong(): data = request.get_json(force=True) features = data['features'] # model is not loaded here, will cause error prediction = model.predict([features]) return jsonify({'prediction': prediction[0]}) # Correct: Load model before using model = joblib.load('model.joblib') @app.route('/predict_correct', methods=['POST']) def predict_correct(): data = request.get_json(force=True) features = data['features'] prediction = model.predict([features]) return jsonify({'prediction': prediction[0]})
Quick Reference
- Use
joblib.load()orpickle.load()to load your saved model. - Use
FlaskorFastAPIto create a web server for your model. - Define a POST endpoint that accepts JSON input with features.
- Convert input to the correct shape before calling
model.predict(). - Return predictions as JSON responses.
- Test your API locally before deploying.
Key Takeaways
Load your trained ML model before serving it via an API.
Use web frameworks like Flask to create endpoints that accept input and return predictions.
Ensure input data format matches what the model expects to avoid errors.
Add error handling and avoid debug mode in production.
Test your serving API locally before deploying to production.