Sequence-to-sequence models help computers turn one sequence of things into another sequence. For example, changing a sentence in English into a sentence in French.
Sequence-to-sequence basics in TensorFlow
import tensorflow as tf from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, LSTM, Dense # Encoder encoder_inputs = Input(shape=(None, num_features)) encoder_lstm = LSTM(units, return_state=True) encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs) encoder_states = [state_h, state_c] # Decoder decoder_inputs = Input(shape=(None, num_features)) decoder_lstm = LSTM(units, return_sequences=True, return_state=True) decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states) decoder_dense = Dense(num_classes, activation='softmax') decoder_outputs = decoder_dense(decoder_outputs) # Model model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
The encoder reads the input sequence and summarizes it into states.
The decoder uses these states to generate the output sequence step-by-step.
encoder_inputs = Input(shape=(None, 50)) encoder_lstm = LSTM(64, return_state=True) encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs) encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(None, 50)) decoder_lstm = LSTM(64, return_sequences=True, return_state=True) decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states) decoder_dense = Dense(100, activation='softmax') decoder_outputs = decoder_dense(decoder_outputs)
This example builds a simple sequence-to-sequence model with LSTM layers. It trains on random data for 2 epochs and then predicts output for one sample.
import numpy as np import tensorflow as tf from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, LSTM, Dense # Parameters num_encoder_tokens = 10 num_decoder_tokens = 15 latent_dim = 16 # Encoder encoder_inputs = Input(shape=(None, num_encoder_tokens)) encoder_lstm = LSTM(latent_dim, return_state=True) encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs) encoder_states = [state_h, state_c] # Decoder decoder_inputs = Input(shape=(None, num_decoder_tokens)) decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True) decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states) decoder_dense = Dense(num_decoder_tokens, activation='softmax') decoder_outputs = decoder_dense(decoder_outputs) # Model model = Model([encoder_inputs, decoder_inputs], decoder_outputs) # Compile model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy']) # Dummy data encoder_input_data = np.random.random((100, 5, num_encoder_tokens)) decoder_input_data = np.random.random((100, 6, num_decoder_tokens)) decoder_target_data = np.random.random((100, 6, num_decoder_tokens)) # Train history = model.fit([encoder_input_data, decoder_input_data], decoder_target_data, batch_size=16, epochs=2) # Predict on one sample test_encoder_input = encoder_input_data[:1] test_decoder_input = decoder_input_data[:1] predictions = model.predict([test_encoder_input, test_decoder_input]) print(f"Predictions shape: {predictions.shape}")
Sequence lengths can vary, so we use None in input shapes.
Return states from the encoder to pass to the decoder for context.
Softmax activation in the decoder output layer helps pick the most likely next item.
Sequence-to-sequence models turn one sequence into another, like translating languages.
They use an encoder to understand the input and a decoder to create the output.
TensorFlow makes it easy to build these models with LSTM layers and functional API.