Ml-pythonHow-ToBeginner · 4 min read

How to Use FastAPI for Model Serving: Simple Guide

Use FastAPI to create a web server that accepts input data, loads your machine learning model, and returns predictions. Define API endpoints with @app.post decorators, process input data, and respond with model outputs in JSON format.

📐

Syntax

FastAPI uses Python functions decorated with @app.post or @app.get to define API endpoints. You load your model once, then use the endpoint function to receive input, run the model, and return predictions.

app = FastAPI(): creates the API app.
@app.post('/predict'): defines a POST endpoint at '/predict'.
def predict(data: InputModel): receives input data validated by Pydantic.
return {'prediction': result}: sends prediction back as JSON.

python

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class InputModel(BaseModel):
    feature1: float
    feature2: float

@app.post('/predict')
def predict(data: InputModel):
    # model prediction logic here
    result = data.feature1 + data.feature2
    return {'prediction': result}

💻

Example

This example shows a simple FastAPI app that serves a dummy model adding two features. It accepts JSON input, runs the model, and returns the prediction.

python

from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn

app = FastAPI()

class InputModel(BaseModel):
    feature1: float
    feature2: float

@app.post('/predict')
def predict(data: InputModel):
    # Dummy model: sum of features
    prediction = data.feature1 + data.feature2
    return {'prediction': prediction}

if __name__ == '__main__':
    uvicorn.run(app, host='127.0.0.1', port=8000)

Output

INFO: Started server process [12345] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)

⚠️

Common Pitfalls

Not validating input data can cause server errors; always use Pydantic models.
Loading the model inside the endpoint function slows response; load it once globally.
Forgetting to run the server with uvicorn or wrong host/port causes connection issues.
Not returning JSON serializable data leads to errors; always return dict or Pydantic models.

python

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

def load_model():
    # Dummy load_model function
    class Model:
        def predict(self, data):
            return data.feature1 + data.feature2
    return Model()

# Wrong: loading model inside endpoint (slow)
@app.post('/predict')
def predict_slow(data: BaseModel):
    model = load_model()  # This slows every request
    result = model.predict(data)
    return {'prediction': result}

# Right: load model once
model = load_model()

@app.post('/predict')
def predict(data: BaseModel):
    result = model.predict(data)
    return {'prediction': result}

📊

Quick Reference

FastAPI model serving quick tips:

Use BaseModel from Pydantic for input validation.
Load your ML model once at startup, not per request.
Define POST endpoints for prediction requests.
Return predictions as JSON dictionaries.
Run server with uvicorn main:app --reload for development.

✅

Key Takeaways

Use FastAPI endpoints with Pydantic models to receive and validate input data.

Load your machine learning model once globally to avoid slow responses.

Return predictions as JSON dictionaries for easy client consumption.

Run the FastAPI app with Uvicorn to serve your model over HTTP.

Avoid loading models inside endpoint functions to keep requests fast.