What is Sequence-to-sequence basics in TensorFlow?

TensorFlowml~5 mins

Sequence-to-sequence basics in TensorFlow

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Introduction

Sequence-to-sequence models help computers turn one sequence of things into another sequence. For example, changing a sentence in English into a sentence in French.

Translating sentences from one language to another.

Turning spoken words into text (speech recognition).

Summarizing long paragraphs into short summaries.

Generating responses in chatbots based on user messages.

Converting handwritten text images into typed text.

Syntax

TensorFlow

import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense

# Encoder
encoder_inputs = Input(shape=(None, num_features))
encoder_lstm = LSTM(units, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None, num_features))
decoder_lstm = LSTM(units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(num_classes, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

The encoder reads the input sequence and summarizes it into states.

The decoder uses these states to generate the output sequence step-by-step.

Examples

This creates an encoder that reads sequences with 50 features and outputs 64 hidden units.

TensorFlow

encoder_inputs = Input(shape=(None, 50))
encoder_lstm = LSTM(64, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]

This sets up a decoder that produces sequences with 100 possible output classes.

TensorFlow

decoder_inputs = Input(shape=(None, 50))
decoder_lstm = LSTM(64, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(100, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

Sample Model

This example builds a simple sequence-to-sequence model with LSTM layers. It trains on random data for 2 epochs and then predicts output for one sample.

TensorFlow

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, LSTM, Dense

# Parameters
num_encoder_tokens = 10
num_decoder_tokens = 15
latent_dim = 16

# Encoder
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None, num_decoder_tokens))
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

# Compile
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

# Dummy data
encoder_input_data = np.random.random((100, 5, num_encoder_tokens))
decoder_input_data = np.random.random((100, 6, num_decoder_tokens))
decoder_target_data = np.random.random((100, 6, num_decoder_tokens))

# Train
history = model.fit([encoder_input_data, decoder_input_data], decoder_target_data, batch_size=16, epochs=2)

# Predict on one sample
test_encoder_input = encoder_input_data[:1]
test_decoder_input = decoder_input_data[:1]
predictions = model.predict([test_encoder_input, test_decoder_input])

print(f"Predictions shape: {predictions.shape}")

OutputSuccess

Important Notes

Sequence lengths can vary, so we use None in input shapes.

Return states from the encoder to pass to the decoder for context.

Softmax activation in the decoder output layer helps pick the most likely next item.

Summary

Sequence-to-sequence models turn one sequence into another, like translating languages.

They use an encoder to understand the input and a decoder to create the output.

TensorFlow makes it easy to build these models with LSTM layers and functional API.