Bidirectional LSTM helps a model understand information from both past and future in a sequence. This makes it better at tasks like language understanding.
Bidirectional LSTM in NLP
Start learning this pattern below
Jump into concepts and practice - no test required
from tensorflow.keras.layers import Bidirectional, LSTM bidirectional_layer = Bidirectional(LSTM(units))
The Bidirectional wrapper takes a recurrent layer like LSTM and runs it forwards and backwards.
The units parameter sets how many memory cells the LSTM has.
Bidirectional(LSTM(64))Bidirectional(LSTM(32, return_sequences=True))
Bidirectional(LSTM(128, dropout=0.2))
This example builds a simple model with a bidirectional LSTM layer to classify sequences. It trains on random data and prints predictions.
import numpy as np from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Embedding, Bidirectional, LSTM, Dense # Sample data: 5 sequences, each with 10 words (integers) x_train = np.random.randint(1, 1000, (5, 10)) y_train = np.array([0, 1, 0, 1, 0]) # Binary labels model = Sequential([ Embedding(input_dim=1000, output_dim=16, input_length=10), Bidirectional(LSTM(8)), Dense(1, activation='sigmoid') ]) model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy']) history = model.fit(x_train, y_train, epochs=3, verbose=2) # Predict on the training data predictions = model.predict(x_train) print('Predictions:', predictions.flatten())
Bidirectional LSTM doubles the number of parameters because it runs two LSTMs.
Use return_sequences=True if you want to stack more recurrent layers.
Bidirectional LSTM works best when the entire sequence is available, not for streaming data.
Bidirectional LSTM reads sequences forward and backward to capture full context.
It improves performance on tasks like language understanding and time series analysis.
Use the Bidirectional wrapper around an LSTM layer in your model.
Practice
Bidirectional LSTM compared to a standard LSTM?Solution
Step 1: Understand LSTM directionality
A standard LSTM reads the input sequence only in the forward direction, from start to end.Step 2: Analyze Bidirectional LSTM behavior
A Bidirectional LSTM reads the sequence both forward and backward, capturing information from past and future context.Final Answer:
It processes the input sequence in both forward and backward directions to capture more context. -> Option CQuick Check:
Bidirectional means forward + backward = C [OK]
- Thinking it only reads backward
- Assuming it reduces parameters
- Confusing it with simpler RNNs
Solution
Step 1: Recall Keras Bidirectional syntax
In Keras, the Bidirectional wrapper takes an RNN layer like LSTM as its argument.Step 2: Check each option
model.add(Bidirectional(LSTM(units=64))) correctly wraps LSTM inside Bidirectional. The other options misuse the syntax or parameters.Final Answer:
model.add(Bidirectional(LSTM(units=64))) -> Option AQuick Check:
Bidirectional wraps LSTM layer = A [OK]
- Putting Bidirectional inside LSTM
- Passing units to Bidirectional instead of LSTM
- Using bidirectional=True parameter in LSTM
from tensorflow.keras.models import Sequential from tensorflow.keras.layers import LSTM, Bidirectional, Dense model = Sequential() model.add(Bidirectional(LSTM(10, return_sequences=False), input_shape=(5, 8))) model.add(Dense(1, activation='sigmoid')) model.compile(optimizer='adam', loss='binary_crossentropy') import numpy as np x = np.random.random((2, 5, 8)) pred = model.predict(x) print(pred.shape)
What will be the shape of
pred?Solution
Step 1: Understand model output shape
The Bidirectional LSTM with 10 units outputs 20 features (10 forward + 10 backward) per timestep. Since return_sequences=False, it outputs only the last timestep's features, shape (batch_size, 20).Step 2: Dense layer output shape
The Dense layer with 1 unit outputs shape (batch_size, 1). Input batch size is 2, so output shape is (2, 1).Final Answer:
(2, 1) -> Option BQuick Check:
Batch size 2, Dense 1 unit = (2, 1) [OK]
- Confusing return_sequences=True vs False
- Forgetting bidirectional doubles units
- Mixing batch and timestep dimensions
model = Sequential() model.add(Bidirectional(LSTM(32), input_shape=(10, 16))) model.add(Dense(1)) model.compile(optimizer='adam', loss='mse') # Training data X_train = np.random.random((100, 10, 16)) y_train = np.random.random((100,)) model.fit(X_train, y_train, epochs=5)
The error says:
ValueError: Error when checking target: expected dense_1 to have shape (None, 1) but got array with shape (100,)What is the fix?
Solution
Step 1: Understand error message
The model expects targets with shape (batch_size, 1) because Dense(1) outputs shape (None, 1). But y_train has shape (100,), missing the last dimension.Step 2: Fix target shape
Reshape y_train to (100, 1) to match model output shape. This fixes the mismatch error.Final Answer:
Change y_train shape to (100, 1) by reshaping it. -> Option DQuick Check:
Target shape matches output shape = B [OK]
- Changing model output units instead of target shape
- Setting return_sequences=True unnecessarily
- Removing Bidirectional without reason
Solution
Step 1: Understand context capture
Bidirectional LSTM reads sequences forward and backward, capturing full context.Step 2: Fixed-size vector output
Using return_sequences=True outputs a sequence, so applying GlobalMaxPooling1D converts it to a fixed-size vector summarizing important features.Step 3: Compare options
Embedding -> Bidirectional(LSTM with return_sequences=True) -> GlobalMaxPooling1D -> Dense uses Bidirectional LSTM with return_sequences=True plus pooling, best for full context and fixed vector. Embedding -> Bidirectional(LSTM with return_sequences=False) -> Dense skips pooling, output is last timestep only. Embedding -> LSTM with return_sequences=False -> Dense is unidirectional. Embedding -> Bidirectional(LSTM with return_sequences=True) -> Dense outputs sequence but no pooling, so Dense gets sequence input, causing shape issues.Final Answer:
Embedding -> Bidirectional(LSTM with return_sequences=True) -> GlobalMaxPooling1D -> Dense -> Option AQuick Check:
Pooling after bidirectional LSTM = A [OK]
- Using return_sequences=False loses sequence info
- Skipping pooling leads to shape mismatch
- Using unidirectional LSTM loses backward context
