How to Use Flask for Model Serving: Simple Guide
Use
Flask to create a web server that listens for requests and returns model predictions. Load your trained model in the Flask app, define an endpoint to receive input data, run the model prediction, and send back the results as JSON.Syntax
This is the basic pattern to serve a model with Flask:
from flask import Flask, request, jsonify: Import Flask and helpers.app = Flask(__name__): Create the Flask app.@app.route('/predict', methods=['POST']): Define a URL endpoint for predictions.request.get_json(): Get input data sent as JSON.model.predict(): Use your loaded model to predict.jsonify(): Send prediction results back as JSON.app.run(): Start the server.
python
from flask import Flask, request, jsonify app = Flask(__name__) # Load your model here (example: model = load_model('model.pkl')) @app.route('/predict', methods=['POST']) def predict(): data = request.get_json() # Extract features from data # prediction = model.predict(features) prediction = 'dummy prediction' # placeholder return jsonify({'prediction': prediction}) if __name__ == '__main__': app.run(debug=True)
Example
This example shows how to serve a simple scikit-learn model that predicts if a number is even or odd. The Flask app receives a JSON with a number, predicts, and returns the result.
python
from flask import Flask, request, jsonify import joblib import numpy as np app = Flask(__name__) # Simple model function instead of loading a real model # This function returns 'even' or 'odd' based on input number def model_predict(x): return 'even' if x % 2 == 0 else 'odd' @app.route('/predict', methods=['POST']) def predict(): data = request.get_json() number = data.get('number') if number is None or not isinstance(number, int): return jsonify({'error': 'Please provide an integer "number"'}), 400 prediction = model_predict(number) return jsonify({'number': number, 'prediction': prediction}) if __name__ == '__main__': app.run(debug=True)
Output
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
# Example POST request with JSON {"number": 4} returns:
# {"number":4,"prediction":"even"}
Common Pitfalls
- Not parsing JSON input correctly: Always use
request.get_json()to get JSON data. - Model not loaded before requests: Load your model once before handling requests to avoid delays.
- Wrong HTTP method: Use
POSTfor sending data, notGET. - Not handling errors: Validate input and return clear error messages with proper HTTP status codes.
- Running Flask in debug mode in production: Use debug only for development.
python
from flask import Flask, request, jsonify app = Flask(__name__) @app.route('/predict', methods=['GET']) # Wrong method # Correct method should be POST # @app.route('/predict', methods=['POST']) def predict(): data = request.get_json() if not data: return jsonify({'error': 'No JSON received'}), 400 return jsonify({'message': 'This is wrong method example'}) if __name__ == '__main__': app.run(debug=True)
Quick Reference
Remember these key points when using Flask for model serving:
- Load your model once at app start, not inside request handlers.
- Use
POSTrequests with JSON input for predictions. - Validate input data and handle errors gracefully.
- Return predictions as JSON using
jsonify(). - Run Flask with
debug=Trueonly during development.
Key Takeaways
Use Flask to create a simple web server that accepts JSON input and returns model predictions.
Load your machine learning model once when the app starts to improve performance.
Define a POST endpoint that extracts input data, runs prediction, and returns JSON results.
Validate inputs and handle errors to avoid server crashes and provide clear feedback.
Run Flask in debug mode only during development, never in production.