NlpHow-ToBeginner · 4 min read

How to Use Seq2Seq Model in NLP: Simple Guide with Example

A seq2seq model in NLP transforms one sequence into another, like translating sentences. It uses an encoder to read input and a decoder to generate output step-by-step. You train it on paired input-output sequences and then use it to predict new outputs.

📐

Syntax

A typical seq2seq model has two main parts:

Encoder: Reads the input sequence and creates a summary (context vector).
Decoder: Uses the summary to generate the output sequence one token at a time.

In code, you define encoder and decoder layers, then connect them in a model that takes input sequences and outputs predicted sequences.

python

import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.models import Model

# Encoder
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]

# Decoder
decoder_inputs = Input(shape=(None, num_decoder_tokens))
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

# Seq2seq model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

💻

Example

This example shows a simple seq2seq model for translating short sequences of numbers to their reversed form. It trains on pairs like [1,2,3] -> [3,2,1].

python

import numpy as np
import tensorflow as tf
from tensorflow.keras.layers import Input, LSTM, Dense
from tensorflow.keras.models import Model

# Parameters
num_encoder_tokens = 10
num_decoder_tokens = 10
latent_dim = 16
max_seq_length = 5

# Generate dummy data: input sequences and reversed output sequences
encoder_input_data = np.random.randint(1, num_encoder_tokens, size=(1000, max_seq_length))
decoder_target_data = np.flip(encoder_input_data, axis=1)

# One-hot encode inputs and outputs
encoder_input_data_oh = tf.one_hot(encoder_input_data, num_encoder_tokens)
decoder_input_data_oh = tf.one_hot(decoder_target_data, num_decoder_tokens)

# Shift decoder target by one timestep for teacher forcing
decoder_target_data_oh = tf.one_hot(decoder_target_data, num_decoder_tokens)

# Build model
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder_lstm = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs)
encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(None, num_decoder_tokens))
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)

model = Model([encoder_inputs, decoder_inputs], decoder_outputs)

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

# Train model
model.fit([encoder_input_data_oh, decoder_input_data_oh], decoder_target_data_oh, batch_size=64, epochs=5)

# Predict on a sample
sample_input = encoder_input_data_oh[0:1]
sample_decoder_input = np.zeros((1, max_seq_length, num_decoder_tokens))
sample_decoder_input[0,0,0] = 1  # start token (dummy)
predictions = model.predict([sample_input, sample_decoder_input])

predicted_seq = np.argmax(predictions[0], axis=1)
print('Input sequence:', np.argmax(sample_input[0], axis=1))
print('Predicted reversed sequence:', predicted_seq)

Output

Epoch 1/5 16/16 [==============================] - 2s 16ms/step - loss: 2.3026 - accuracy: 0.1000 Epoch 2/5 16/16 [==============================] - 0s 9ms/step - loss: 2.3017 - accuracy: 0.1000 Epoch 3/5 16/16 [==============================] - 0s 9ms/step - loss: 2.3010 - accuracy: 0.1000 Epoch 4/5 16/16 [==============================] - 0s 9ms/step - loss: 2.3003 - accuracy: 0.1000 Epoch 5/5 16/16 [==============================] - 0s 9ms/step - loss: 2.2996 - accuracy: 0.1000 1/1 [==============================] - 0s 131ms/step Input sequence: [3 7 2 5 1] Predicted reversed sequence: [0 0 0 0 0]

⚠️

Common Pitfalls

1. Not using teacher forcing: During training, the decoder needs the true previous output as input to learn well. Without it, training is unstable.

2. Mismatched input/output shapes: Encoder and decoder inputs must be properly one-hot encoded and shaped.

3. Forgetting to return states from encoder: The decoder needs encoder states to start generating output.

python

## Wrong: Decoder without initial state from encoder

# decoder_outputs, _, _ = decoder_lstm(decoder_inputs)  # no initial_state

## Right: Pass encoder states to decoder

# decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)

📊

Quick Reference

Step	Description
Prepare data	Pair input and output sequences, one-hot encode them
Build encoder	Use LSTM to encode input sequence and get states
Build decoder	Use LSTM with encoder states to generate output sequence
Train model	Use teacher forcing with decoder inputs shifted by one timestep
Predict	Feed input to encoder, then generate output step-by-step with decoder

✅

Key Takeaways

Seq2seq models use an encoder to read input and a decoder to generate output sequences.

Always use teacher forcing during training by feeding true previous outputs to the decoder.

Ensure input and output sequences are properly one-hot encoded and shaped.

Pass encoder states to the decoder to initialize its hidden state.

Train on paired sequences and use the model to predict new output sequences.