Performance: LangServe for API deployment
MEDIUM IMPACT
LangServe impacts API response time and server resource usage during model inference and request handling.
from fastapi import FastAPI from langserve import add_routes model = load_model('large-model') # Load once at startup app = FastAPI() add_routes(app, RunnableLambda(lambda input_text: model.generate(input_text)), path="/predict")
from fastapi import FastAPI from langserve import add_routes from langchain_core.runnables import RunnableLambda app = FastAPI() def predict(input_text: str): # Load model inside the function model = load_model('large-model') return model.generate(input_text) add_routes(app, RunnableLambda(predict), path="/predict")
| Pattern | Model Loading | API Latency | Server CPU Usage | Verdict |
|---|---|---|---|---|
| Load model per request | Repeated loading | High (seconds) | High CPU spikes | [X] Bad |
| Load model once at startup | Single loading | Low (milliseconds) | Stable CPU | [OK] Good |