0
0
TensorFlowml~5 mins

Sequence-to-sequence basics in TensorFlow

Choose your learning style9 modes available
Introduction

Sequence-to-sequence models help computers turn one sequence of things into another sequence. For example, changing a sentence in English into a sentence in French.

Translating sentences from one language to another.
Turning spoken words into text (speech recognition).
Summarizing long paragraphs into short summaries.
Generating responses in chatbots based on user messages.
Converting handwritten text images into typed text.
Syntax
TensorFlow
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense

# Encoder
encoder_inputs = Input(shape=(None, num_features))
encoder_lstm = LSTM(units, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None, num_features))
decoder_lstm = LSTM(units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(num_classes, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

The encoder reads the input sequence and summarizes it into states.

The decoder uses these states to generate the output sequence step-by-step.

Examples
This creates an encoder that reads sequences with 50 features and outputs 64 hidden units.
TensorFlow
encoder_inputs = Input(shape=(None, 50))
encoder_lstm = LSTM(64, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]
This sets up a decoder that produces sequences with 100 possible output classes.
TensorFlow
decoder_inputs = Input(shape=(None, 50))
decoder_lstm = LSTM(64, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(100, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
Sample Model

This example builds a simple sequence-to-sequence model with LSTM layers. It trains on random data for 2 epochs and then predicts output for one sample.

TensorFlow
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense

# Parameters
num_encoder_tokens = 10
num_decoder_tokens = 15
latent_dim = 16

# Encoder
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None, num_decoder_tokens))
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Compile
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

# Dummy data
encoder_input_data = np.random.random((100, 5, num_encoder_tokens))
decoder_input_data = np.random.random((100, 6, num_decoder_tokens))
decoder_target_data = np.random.random((100, 6, num_decoder_tokens))

# Train
history = model.fit([encoder_input_data, decoder_input_data], decoder_target_data, batch_size=16, epochs=2)

# Predict on one sample
test_encoder_input = encoder_input_data[:1]
test_decoder_input = decoder_input_data[:1]
predictions = model.predict([test_encoder_input, test_decoder_input])

print(f"Predictions shape: {predictions.shape}")
OutputSuccess
Important Notes

Sequence lengths can vary, so we use None in input shapes.

Return states from the encoder to pass to the decoder for context.

Softmax activation in the decoder output layer helps pick the most likely next item.

Summary

Sequence-to-sequence models turn one sequence into another, like translating languages.

They use an encoder to understand the input and a decoder to create the output.

TensorFlow makes it easy to build these models with LSTM layers and functional API.