How to Use FastAPI for Model Serving: Simple Guide
Use
FastAPI to create a web server that accepts input data, loads your machine learning model, and returns predictions. Define API endpoints with @app.post decorators, process input data, and respond with model outputs in JSON format.Syntax
FastAPI uses Python functions decorated with @app.post or @app.get to define API endpoints. You load your model once, then use the endpoint function to receive input, run the model, and return predictions.
app = FastAPI(): creates the API app.@app.post('/predict'): defines a POST endpoint at '/predict'.def predict(data: InputModel):receives input data validated by Pydantic.return {'prediction': result}: sends prediction back as JSON.
python
from fastapi import FastAPI from pydantic import BaseModel app = FastAPI() class InputModel(BaseModel): feature1: float feature2: float @app.post('/predict') def predict(data: InputModel): # model prediction logic here result = data.feature1 + data.feature2 return {'prediction': result}
Example
This example shows a simple FastAPI app that serves a dummy model adding two features. It accepts JSON input, runs the model, and returns the prediction.
python
from fastapi import FastAPI from pydantic import BaseModel import uvicorn app = FastAPI() class InputModel(BaseModel): feature1: float feature2: float @app.post('/predict') def predict(data: InputModel): # Dummy model: sum of features prediction = data.feature1 + data.feature2 return {'prediction': prediction} if __name__ == '__main__': uvicorn.run(app, host='127.0.0.1', port=8000)
Output
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
Common Pitfalls
- Not validating input data can cause server errors; always use Pydantic models.
- Loading the model inside the endpoint function slows response; load it once globally.
- Forgetting to run the server with
uvicornor wrong host/port causes connection issues. - Not returning JSON serializable data leads to errors; always return dict or Pydantic models.
python
from fastapi import FastAPI from pydantic import BaseModel app = FastAPI() def load_model(): # Dummy load_model function class Model: def predict(self, data): return data.feature1 + data.feature2 return Model() # Wrong: loading model inside endpoint (slow) @app.post('/predict') def predict_slow(data: BaseModel): model = load_model() # This slows every request result = model.predict(data) return {'prediction': result} # Right: load model once model = load_model() @app.post('/predict') def predict(data: BaseModel): result = model.predict(data) return {'prediction': result}
Quick Reference
FastAPI model serving quick tips:
- Use
BaseModelfrom Pydantic for input validation. - Load your ML model once at startup, not per request.
- Define POST endpoints for prediction requests.
- Return predictions as JSON dictionaries.
- Run server with
uvicorn main:app --reloadfor development.
Key Takeaways
Use FastAPI endpoints with Pydantic models to receive and validate input data.
Load your machine learning model once globally to avoid slow responses.
Return predictions as JSON dictionaries for easy client consumption.
Run the FastAPI app with Uvicorn to serve your model over HTTP.
Avoid loading models inside endpoint functions to keep requests fast.