Ml-pythonHow-ToBeginner · 4 min read

How to Use TensorFlow Serving for Model Deployment

Use TensorFlow Serving to deploy TensorFlow models by exporting your model in SavedModel format and running the tensorflow_model_server binary to serve it. Then, send prediction requests via REST or gRPC to get model outputs in real time.

📐

Syntax

TensorFlow Serving runs a server that loads your exported model and listens for prediction requests. The main command syntax is:

tensorflow_model_server --model_name=MODEL_NAME --model_base_path=PATH_TO_SAVED_MODEL
--port=PORT_NUMBER (optional, default 8500 for gRPC)

This starts the server hosting your model for clients to query.

bash

tensorflow_model_server --model_name=my_model --model_base_path=/models/my_model/1/

💻

Example

This example shows how to export a simple TensorFlow model, start TensorFlow Serving, and send a prediction request using Python.

python

# Step 1: Export a simple TensorFlow model
import tensorflow as tf
import numpy as np

# Create a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(1, input_shape=(3,))
])
model.compile(optimizer='adam', loss='mse')

# Train dummy data
x = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32)
y = np.array([[10], [20]], dtype=np.float32)
model.fit(x, y, epochs=1)

# Export the model in SavedModel format
export_path = '/tmp/my_model/1'
tf.saved_model.save(model, export_path)

# Step 2: Start TensorFlow Serving (run this in terminal separately)
# tensorflow_model_server --model_name=my_model --model_base_path=/tmp/my_model/1 &

# Step 3: Send a prediction request using REST API
import requests
import json

url = 'http://localhost:8501/v1/models/my_model:predict'

# Prepare input data
input_data = {'instances': [[7.0, 8.0, 9.0]]}

# Send POST request
response = requests.post(url, data=json.dumps(input_data))
print('Prediction response:', response.json())

Output

Epoch 1/1 1/1 [==============================] - 0s 2ms/step - loss: 0.0000e+00 Prediction response: {'predictions': [[29.999998]]}

⚠️

Common Pitfalls

Common mistakes when using TensorFlow Serving include:

Not exporting the model in SavedModel format, which is required.
Using the wrong model_base_path or missing version subfolder (e.g., /1/).
Not starting the server before sending requests.
Sending requests to the wrong port or URL path.
Forgetting to format input data as JSON with the instances key.

bash

# Wrong: Missing version folder in model_base_path
tensorflow_model_server --model_name=my_model --model_base_path=/tmp/my_model

# Right: Include version folder
tensorflow_model_server --model_name=my_model --model_base_path=/tmp/my_model/1

📊

Quick Reference

Command/Concept	Description
tensorflow_model_server	Starts the TensorFlow Serving server
--model_name	Name to assign to your model in the server
--model_base_path	Path to the exported SavedModel directory with version subfolders
REST API URL	http://localhost:8501/v1/models/MODEL_NAME:predict
Input format	JSON with key 'instances' containing input data
Ports	Default 8500 for gRPC, 8501 for REST API

✅

Key Takeaways

Export your TensorFlow model in SavedModel format with versioned folders before serving.

Start TensorFlow Serving using the tensorflow_model_server command with correct model path and name.

Send prediction requests via REST or gRPC to the server's correct URL and port.

Format input data as JSON with the 'instances' key for REST API requests.

Check server logs and paths carefully to avoid common setup mistakes.