How to Use TensorFlow Serving for Model Deployment
Use
TensorFlow Serving to deploy TensorFlow models by exporting your model in SavedModel format and running the tensorflow_model_server binary to serve it. Then, send prediction requests via REST or gRPC to get model outputs in real time.Syntax
TensorFlow Serving runs a server that loads your exported model and listens for prediction requests. The main command syntax is:
tensorflow_model_server --model_name=MODEL_NAME --model_base_path=PATH_TO_SAVED_MODEL--port=PORT_NUMBER(optional, default 8500 for gRPC)
This starts the server hosting your model for clients to query.
bash
tensorflow_model_server --model_name=my_model --model_base_path=/models/my_model/1/Example
This example shows how to export a simple TensorFlow model, start TensorFlow Serving, and send a prediction request using Python.
python
# Step 1: Export a simple TensorFlow model import tensorflow as tf import numpy as np # Create a simple model model = tf.keras.Sequential([ tf.keras.layers.Dense(1, input_shape=(3,)) ]) model.compile(optimizer='adam', loss='mse') # Train dummy data x = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32) y = np.array([[10], [20]], dtype=np.float32) model.fit(x, y, epochs=1) # Export the model in SavedModel format export_path = '/tmp/my_model/1' tf.saved_model.save(model, export_path) # Step 2: Start TensorFlow Serving (run this in terminal separately) # tensorflow_model_server --model_name=my_model --model_base_path=/tmp/my_model/1 & # Step 3: Send a prediction request using REST API import requests import json url = 'http://localhost:8501/v1/models/my_model:predict' # Prepare input data input_data = {'instances': [[7.0, 8.0, 9.0]]} # Send POST request response = requests.post(url, data=json.dumps(input_data)) print('Prediction response:', response.json())
Output
Epoch 1/1
1/1 [==============================] - 0s 2ms/step - loss: 0.0000e+00
Prediction response: {'predictions': [[29.999998]]}
Common Pitfalls
Common mistakes when using TensorFlow Serving include:
- Not exporting the model in
SavedModelformat, which is required. - Using the wrong
model_base_pathor missing version subfolder (e.g.,/1/). - Not starting the server before sending requests.
- Sending requests to the wrong port or URL path.
- Forgetting to format input data as JSON with the
instanceskey.
bash
# Wrong: Missing version folder in model_base_path
tensorflow_model_server --model_name=my_model --model_base_path=/tmp/my_model
# Right: Include version folder
tensorflow_model_server --model_name=my_model --model_base_path=/tmp/my_model/1Quick Reference
| Command/Concept | Description |
|---|---|
| tensorflow_model_server | Starts the TensorFlow Serving server |
| --model_name | Name to assign to your model in the server |
| --model_base_path | Path to the exported SavedModel directory with version subfolders |
| REST API URL | http://localhost:8501/v1/models/MODEL_NAME:predict |
| Input format | JSON with key 'instances' containing input data |
| Ports | Default 8500 for gRPC, 8501 for REST API |
Key Takeaways
Export your TensorFlow model in SavedModel format with versioned folders before serving.
Start TensorFlow Serving using the tensorflow_model_server command with correct model path and name.
Send prediction requests via REST or gRPC to the server's correct URL and port.
Format input data as JSON with the 'instances' key for REST API requests.
Check server logs and paths carefully to avoid common setup mistakes.