Ml-pythonHow-ToBeginner · 4 min read

How to Serve ML Model: Simple Steps to Deploy Your Model

To serve an ML model, you typically create a web API using frameworks like Flask or FastAPI that loads the trained model and responds to prediction requests. This allows other applications to send data and receive predictions in real time.

📐

Syntax

Serving an ML model usually involves these steps:

Load the trained model: Use libraries like joblib or pickle to load your saved model.
Create a web server: Use a web framework like Flask to handle incoming requests.
Define an API endpoint: Create a route (e.g., /predict) that accepts input data.
Process input and predict: Convert input data to the right format and call the model's predict method.
Return the prediction: Send the prediction back as a JSON response.

python

from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)

# Load the trained model
model = joblib.load('model.joblib')

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json(force=True)
    features = data['features']
    prediction = model.predict([features])
    return jsonify({'prediction': prediction[0].item()})

if __name__ == '__main__':
    app.run(debug=True)

💻

Example

This example shows how to serve a simple scikit-learn model using Flask. It loads a saved model, accepts JSON input with features, and returns the prediction.

python

from flask import Flask, request, jsonify
import joblib
import numpy as np

app = Flask(__name__)

# For demonstration, create and save a simple model
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target
model = LogisticRegression(max_iter=200)
model.fit(X, y)
joblib.dump(model, 'model.joblib')

# Load the model
model = joblib.load('model.joblib')

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json(force=True)
    features = data['features']  # expects a list of numbers
    features_array = np.array(features).reshape(1, -1)
    prediction = model.predict(features_array)
    return jsonify({'prediction': int(prediction[0])})

if __name__ == '__main__':
    app.run(debug=True)

Output

* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit) 127.0.0.1 - - [Date Time] "POST /predict HTTP/1.1" 200 -

⚠️

Common Pitfalls

Not matching input format: The input JSON must match the expected feature format exactly.
Model file missing or incompatible: Ensure the model file is saved and loaded correctly with matching library versions.
Not handling errors: Add error handling for bad inputs or prediction failures.
Running in debug mode in production: Debug mode is for development only and should be off in production.

python

from flask import Flask, request, jsonify
import joblib

app = Flask(__name__)

# Wrong: Not loading model before prediction
@app.route('/predict', methods=['POST'])
def predict_wrong():
    data = request.get_json(force=True)
    features = data['features']
    # model is not loaded here, will cause error
    prediction = model.predict([features])
    return jsonify({'prediction': prediction[0]})

# Correct: Load model before using
model = joblib.load('model.joblib')

@app.route('/predict_correct', methods=['POST'])
def predict_correct():
    data = request.get_json(force=True)
    features = data['features']
    prediction = model.predict([features])
    return jsonify({'prediction': prediction[0]})

📊

Quick Reference

Use joblib.load() or pickle.load() to load your saved model.
Use Flask or FastAPI to create a web server for your model.
Define a POST endpoint that accepts JSON input with features.
Convert input to the correct shape before calling model.predict().
Return predictions as JSON responses.
Test your API locally before deploying.

✅

Key Takeaways

Load your trained ML model before serving it via an API.

Use web frameworks like Flask to create endpoints that accept input and return predictions.

Ensure input data format matches what the model expects to avoid errors.

Add error handling and avoid debug mode in production.

Test your serving API locally before deploying to production.