Ml-pythonHow-ToBeginner · 3 min read

How to Use Flask for Model Serving: Simple Guide

Use Flask to create a web server that listens for requests and returns model predictions. Load your trained model in the Flask app, define an endpoint to receive input data, run the model prediction, and send back the results as JSON.

📐

Syntax

This is the basic pattern to serve a model with Flask:

from flask import Flask, request, jsonify: Import Flask and helpers.
app = Flask(__name__): Create the Flask app.
@app.route('/predict', methods=['POST']): Define a URL endpoint for predictions.
request.get_json(): Get input data sent as JSON.
model.predict(): Use your loaded model to predict.
jsonify(): Send prediction results back as JSON.
app.run(): Start the server.

python

from flask import Flask, request, jsonify

app = Flask(__name__)

# Load your model here (example: model = load_model('model.pkl'))

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    # Extract features from data
    # prediction = model.predict(features)
    prediction = 'dummy prediction'  # placeholder
    return jsonify({'prediction': prediction})

if __name__ == '__main__':
    app.run(debug=True)

💻

Example

This example shows how to serve a simple scikit-learn model that predicts if a number is even or odd. The Flask app receives a JSON with a number, predicts, and returns the result.

python

from flask import Flask, request, jsonify
import joblib
import numpy as np

app = Flask(__name__)

# Simple model function instead of loading a real model
# This function returns 'even' or 'odd' based on input number

def model_predict(x):
    return 'even' if x % 2 == 0 else 'odd'

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    number = data.get('number')
    if number is None or not isinstance(number, int):
        return jsonify({'error': 'Please provide an integer "number"'}), 400
    prediction = model_predict(number)
    return jsonify({'number': number, 'prediction': prediction})

if __name__ == '__main__':
    app.run(debug=True)

Output

* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit) # Example POST request with JSON {"number": 4} returns: # {"number":4,"prediction":"even"}

⚠️

Common Pitfalls

Not parsing JSON input correctly: Always use request.get_json() to get JSON data.
Model not loaded before requests: Load your model once before handling requests to avoid delays.
Wrong HTTP method: Use POST for sending data, not GET.
Not handling errors: Validate input and return clear error messages with proper HTTP status codes.
Running Flask in debug mode in production: Use debug only for development.

python

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/predict', methods=['GET'])  # Wrong method
# Correct method should be POST
# @app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    if not data:
        return jsonify({'error': 'No JSON received'}), 400
    return jsonify({'message': 'This is wrong method example'})

if __name__ == '__main__':
    app.run(debug=True)

📊

Quick Reference

Remember these key points when using Flask for model serving:

Load your model once at app start, not inside request handlers.
Use POST requests with JSON input for predictions.
Validate input data and handle errors gracefully.
Return predictions as JSON using jsonify().
Run Flask with debug=True only during development.

✅

Key Takeaways

Use Flask to create a simple web server that accepts JSON input and returns model predictions.

Load your machine learning model once when the app starts to improve performance.

Define a POST endpoint that extracts input data, runs prediction, and returns JSON results.

Validate inputs and handle errors to avoid server crashes and provide clear feedback.

Run Flask in debug mode only during development, never in production.