Bidirectional RNNs help a model understand information from both past and future in a sequence. This makes predictions better when context from both sides matters.
Bidirectional RNNs in PyTorch
Start learning this pattern below
Jump into concepts and practice - no test required
torch.nn.RNN(input_size, hidden_size, num_layers=1, bidirectional=True)
Set bidirectional=True to make the RNN read input forwards and backwards.
The output size doubles because it combines forward and backward passes.
rnn = torch.nn.RNN(input_size=10, hidden_size=20, bidirectional=True)
rnn = torch.nn.RNN(input_size=5, hidden_size=15, num_layers=2, bidirectional=True)
This code creates a bidirectional RNN and runs a random input through it. The output shape shows the combined forward and backward hidden states for each step. The hidden state contains the last hidden states from both directions.
import torch import torch.nn as nn # Parameters input_size = 3 hidden_size = 4 seq_len = 5 batch_size = 2 # Create bidirectional RNN rnn = nn.RNN(input_size, hidden_size, bidirectional=True, batch_first=True) # Random input: batch_size sequences, each with seq_len steps, each step with input_size features inputs = torch.randn(batch_size, seq_len, input_size) # Forward pass outputs, hidden = rnn(inputs) print('Output shape:', outputs.shape) print('Output:', outputs) print('Hidden shape:', hidden.shape) print('Hidden:', hidden)
The output size is hidden_size * 2 because it combines forward and backward states.
Use batch_first=True to have input shape as (batch, seq_len, features) for easier handling.
Hidden state shape is (num_layers * 2, batch, hidden_size) because of two directions.
Bidirectional RNNs read sequences forwards and backwards to capture full context.
They double the hidden size in output by combining two directions.
Useful when both past and future information matter for predictions.
Practice
bidirectional RNN compared to a standard RNN?Solution
Step 1: Understand standard RNN processing
Standard RNNs process sequences only in the forward direction, so they only see past context.Step 2: Analyze bidirectional RNN behavior
Bidirectional RNNs process sequences both forward and backward, capturing past and future context.Final Answer:
It processes the input sequence in both forward and backward directions to capture full context. -> Option AQuick Check:
Bidirectional = forward + backward context [OK]
- Thinking bidirectional reduces parameters
- Assuming it only reads backward
- Confusing with convolutional layers
Solution
Step 1: Recall PyTorch GRU parameters
Thebidirectionalparameter is a boolean that enables bidirectional processing.Step 2: Identify correct syntax
Only torch.nn.GRU(input_size=10, hidden_size=20, bidirectional=True) usesbidirectional=True, which is the correct PyTorch syntax.Final Answer:
torch.nn.GRU(input_size=10, hidden_size=20, bidirectional=True) -> Option BQuick Check:
bidirectional=True enables two directions [OK]
- Using invalid parameter names like 'direction' or 'two_directions'
- Setting bidirectional=False by mistake
- Confusing input_size and hidden_size
rnn = torch.nn.RNN(input_size=5, hidden_size=3, bidirectional=True, batch_first=True) input = torch.randn(4, 7, 5) # batch=4, seq_len=7, input_size=5 output, _ = rnn(input)
Solution
Step 1: Understand output shape of bidirectional RNN
Output shape is (batch_size, seq_len, hidden_size * num_directions). Here, num_directions=2.Step 2: Calculate output shape
hidden_size=3, so output last dimension = 3 * 2 = 6. Batch=4, seq_len=7, so output shape = [4, 7, 6].Final Answer:
[4, 7, 6] -> Option CQuick Check:
Output last dim = hidden_size * 2 [OK]
- Forgetting to multiply hidden_size by 2
- Mixing batch and sequence dimensions
- Assuming output shape matches input exactly
rnn = torch.nn.RNN(input_size=8, hidden_size=4, bidirectional=True) input = torch.randn(5, 10, 8) output, hidden = rnn(input)
What is the likely cause of the error?
Solution
Step 1: Check default input shape for PyTorch RNN
By default, PyTorch RNN expects input shape (seq_len, batch, input_size) unless batch_first=True is set.Step 2: Analyze given input shape
Input shape is (5, 10, 8) which is (batch, seq_len, input_size), but batch_first=True is not set, causing mismatch.Final Answer:
Input tensor shape should have batch_first=True or be transposed to (seq_len, batch, input_size). -> Option AQuick Check:
Default RNN input shape = (seq_len, batch, input_size) [OK]
- Assuming bidirectional disables shape rules
- Thinking hidden_size must match input_size
- Passing 2D input instead of 3D
Solution
Step 1: Understand variable-length sequence handling
PyTorch requires packing padded sequences to efficiently process variable-length inputs in RNNs.Step 2: Apply packing with bidirectional LSTM
Usepack_padded_sequencebefore feeding to LSTM withbidirectional=True, then unpack withpad_packed_sequence.Final Answer:
Use pack_padded_sequence before the LSTM and pad_packed_sequence after, with batch_first=True and bidirectional=True set. -> Option DQuick Check:
Pack sequences for variable length + bidirectional LSTM [OK]
- Ignoring packing and feeding padded sequences directly
- Disabling bidirectional for variable lengths
- Manually reversing sequences instead of using bidirectional flag
