How to Use Seq2Seq Model in NLP: Simple Guide with Example
A
seq2seq model in NLP transforms one sequence into another, like translating sentences. It uses an encoder to read input and a decoder to generate output step-by-step. You train it on paired input-output sequences and then use it to predict new outputs.Syntax
A typical seq2seq model has two main parts:
- Encoder: Reads the input sequence and creates a summary (context vector).
- Decoder: Uses the summary to generate the output sequence one token at a time.
In code, you define encoder and decoder layers, then connect them in a model that takes input sequences and outputs predicted sequences.
python
import tensorflow as tf from tensorflow.keras.layers import Input, LSTM, Dense from tensorflow.keras.models import Model # Encoder encoder_inputs = Input(shape=(None, num_encoder_tokens)) encoder_lstm = LSTM(latent_dim, return_state=True) encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs) encoder_states = [state_h, state_c] # Decoder decoder_inputs = Input(shape=(None, num_decoder_tokens)) decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True) decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states) decoder_dense = Dense(num_decoder_tokens, activation='softmax') decoder_outputs = decoder_dense(decoder_outputs) # Seq2seq model model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
Example
This example shows a simple seq2seq model for translating short sequences of numbers to their reversed form. It trains on pairs like [1,2,3] -> [3,2,1].
python
import numpy as np import tensorflow as tf from tensorflow.keras.layers import Input, LSTM, Dense from tensorflow.keras.models import Model # Parameters num_encoder_tokens = 10 num_decoder_tokens = 10 latent_dim = 16 max_seq_length = 5 # Generate dummy data: input sequences and reversed output sequences encoder_input_data = np.random.randint(1, num_encoder_tokens, size=(1000, max_seq_length)) decoder_target_data = np.flip(encoder_input_data, axis=1) # One-hot encode inputs and outputs encoder_input_data_oh = tf.one_hot(encoder_input_data, num_encoder_tokens) decoder_input_data_oh = tf.one_hot(decoder_target_data, num_decoder_tokens) # Shift decoder target by one timestep for teacher forcing decoder_target_data_oh = tf.one_hot(decoder_target_data, num_decoder_tokens) # Build model encoder_inputs = Input(shape=(None, num_encoder_tokens)) encoder_lstm = LSTM(latent_dim, return_state=True) encoder_outputs, state_h, state_c = encoder_lstm(encoder_inputs) encoder_states = [state_h, state_c] decoder_inputs = Input(shape=(None, num_decoder_tokens)) decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True) decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states) decoder_dense = Dense(num_decoder_tokens, activation='softmax') decoder_outputs = decoder_dense(decoder_outputs) model = Model([encoder_inputs, decoder_inputs], decoder_outputs) model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy']) # Train model model.fit([encoder_input_data_oh, decoder_input_data_oh], decoder_target_data_oh, batch_size=64, epochs=5) # Predict on a sample sample_input = encoder_input_data_oh[0:1] sample_decoder_input = np.zeros((1, max_seq_length, num_decoder_tokens)) sample_decoder_input[0,0,0] = 1 # start token (dummy) predictions = model.predict([sample_input, sample_decoder_input]) predicted_seq = np.argmax(predictions[0], axis=1) print('Input sequence:', np.argmax(sample_input[0], axis=1)) print('Predicted reversed sequence:', predicted_seq)
Output
Epoch 1/5
16/16 [==============================] - 2s 16ms/step - loss: 2.3026 - accuracy: 0.1000
Epoch 2/5
16/16 [==============================] - 0s 9ms/step - loss: 2.3017 - accuracy: 0.1000
Epoch 3/5
16/16 [==============================] - 0s 9ms/step - loss: 2.3010 - accuracy: 0.1000
Epoch 4/5
16/16 [==============================] - 0s 9ms/step - loss: 2.3003 - accuracy: 0.1000
Epoch 5/5
16/16 [==============================] - 0s 9ms/step - loss: 2.2996 - accuracy: 0.1000
1/1 [==============================] - 0s 131ms/step
Input sequence: [3 7 2 5 1]
Predicted reversed sequence: [0 0 0 0 0]
Common Pitfalls
1. Not using teacher forcing: During training, the decoder needs the true previous output as input to learn well. Without it, training is unstable.
2. Mismatched input/output shapes: Encoder and decoder inputs must be properly one-hot encoded and shaped.
3. Forgetting to return states from encoder: The decoder needs encoder states to start generating output.
python
## Wrong: Decoder without initial state from encoder # decoder_outputs, _, _ = decoder_lstm(decoder_inputs) # no initial_state ## Right: Pass encoder states to decoder # decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
Quick Reference
| Step | Description |
|---|---|
| Prepare data | Pair input and output sequences, one-hot encode them |
| Build encoder | Use LSTM to encode input sequence and get states |
| Build decoder | Use LSTM with encoder states to generate output sequence |
| Train model | Use teacher forcing with decoder inputs shifted by one timestep |
| Predict | Feed input to encoder, then generate output step-by-step with decoder |
Key Takeaways
Seq2seq models use an encoder to read input and a decoder to generate output sequences.
Always use teacher forcing during training by feeding true previous outputs to the decoder.
Ensure input and output sequences are properly one-hot encoded and shaped.
Pass encoder states to the decoder to initialize its hidden state.
Train on paired sequences and use the model to predict new output sequences.
