The nn.GRU layer helps a model remember information from sequences, like words in a sentence, to make better predictions.
nn.GRU layer in PyTorch
Start learning this pattern below
Jump into concepts and practice - no test required
torch.nn.GRU(input_size, hidden_size, num_layers=1, batch_first=False, dropout=0, bidirectional=False)
input_size: Number of features in each input step.
hidden_size: Number of features in the hidden state (memory size).
gru = torch.nn.GRU(input_size=10, hidden_size=20)
gru = torch.nn.GRU(input_size=5, hidden_size=15, num_layers=2, batch_first=True)
This code creates a simple GRU layer and passes a random input through it. It prints the shapes and values of the output and hidden state tensors.
import torch import torch.nn as nn # Create a GRU layer input_size = 3 hidden_size = 5 num_layers = 1 batch_size = 2 seq_length = 4 gru = nn.GRU(input_size, hidden_size, num_layers, batch_first=True) # Random input: batch_size sequences, each with seq_length steps, each step with input_size features input_tensor = torch.randn(batch_size, seq_length, input_size) # Initial hidden state (num_layers, batch_size, hidden_size) h0 = torch.zeros(num_layers, batch_size, hidden_size) # Forward pass through GRU output, hn = gru(input_tensor, h0) print("Output shape:", output.shape) print("Hidden state shape:", hn.shape) print("Output tensor:", output) print("Hidden state tensor:", hn)
The GRU remembers information from previous steps to help with sequence data.
Setting batch_first=True makes input and output shapes easier to work with (batch size first).
You can stack multiple GRU layers by increasing num_layers.
The nn.GRU layer helps models learn from sequences by keeping memory of past inputs.
It takes input size and hidden size as main settings.
Use it when working with time or sequence data like text, speech, or sensor readings.
Practice
nn.GRU layer in PyTorch?Solution
Step 1: Understand the role of GRU
The GRU (Gated Recurrent Unit) is designed to handle sequences by keeping track of past inputs, which helps in tasks like text or speech processing.Step 2: Compare with other options
The other options describe unrelated tasks: dimensionality reduction using PCA, image classification using convolution, and random number generation, which are not the purpose of GRU.Final Answer:
To process sequential data by remembering past information -> Option CQuick Check:
GRU = sequence memory [OK]
- Confusing GRU with convolution layers
- Thinking GRU reduces data dimensions like PCA
- Assuming GRU generates random values
Solution
Step 1: Recall GRU constructor parameters
The correct order and names areinput_sizefirst, thenhidden_size. Sonn.GRU(input_size=10, hidden_size=20)is correct.Step 2: Check other options
nn.GRU(20, 10) reverses the sizes. nn.GRU(hidden_size=10, input_size=20) swaps parameter names incorrectly. nn.GRU(10) misses the hidden size parameter.Final Answer:
nn.GRU(input_size=10, hidden_size=20) -> Option BQuick Check:
Input size first, hidden size second [OK]
- Swapping input_size and hidden_size
- Omitting hidden_size parameter
- Using wrong parameter names
out?
import torch import torch.nn as nn gru = nn.GRU(input_size=5, hidden_size=3, batch_first=True) x = torch.randn(4, 7, 5) # batch=4, seq_len=7, input_size=5 out, h_n = gru(x) print(out.shape)
Solution
Step 1: Understand batch_first=True effect
Withbatch_first=True, input shape is (batch, seq_len, input_size). Output shape matches (batch, seq_len, hidden_size).Step 2: Apply shapes from code
Input is (4, 7, 5), hidden_size=3, so outputoutshape is (4, 7, 3).Final Answer:
(4, 7, 3) -> Option AQuick Check:
Output shape = (batch, seq_len, hidden_size) [OK]
- Confusing batch and sequence dimensions
- Ignoring batch_first parameter
- Mixing hidden_size with input_size
import torch import torch.nn as nn gru = nn.GRU(input_size=8, hidden_size=4) x = torch.randn(5, 10, 8) out, h = gru(x) print(out.shape)
Solution
Step 1: Check default GRU input expectations
By default, GRU expects input shape (seq_len, batch, input_size). Here, input is (5, 10, 8), so seq_len=5, batch=10, input_size=8 which matches default.Step 2: Verify output shape
Output shape will be (seq_len, batch, hidden_size) = (5, 10, 4).Step 3: Evaluate statements
The code runs without errors and prints (5, 10, 4). Hidden_size can be smaller than input_size. batch_first=True is not required. Input shape is correct for default settings.Final Answer:
The code runs without errors and prints (5, 10, 4) -> Option AQuick Check:
Default GRU input shape = (seq_len, batch, input_size) [OK]
- Assuming batch is first dimension without batch_first=True
- Thinking hidden_size must be bigger than input_size
- Expecting output shape to swap batch and seq_len
Solution
Step 1: Understand variable-length sequence handling
PyTorch requires sequences in a batch to be the same length or packed. Padding sequences and usingpack_padded_sequenceallows GRU to ignore padded parts.Step 2: Evaluate options
Pad sequences to the same length and usepack_padded_sequencebefore feeding to nn.GRU correctly pads and packs sequences. Feed raw variable-length sequences directly to nn.GRU without padding is invalid because GRU cannot handle raw variable-length sequences. Use nn.GRU with batch_first=False and ignore sequence lengths ignores lengths, causing wrong results. Manually truncate all sequences to the shortest length before input loses data by truncation.Final Answer:
Pad sequences and use pack_padded_sequence before nn.GRU -> Option DQuick Check:
Use padding + packing for variable-length sequences [OK]
- Feeding variable-length sequences without padding
- Ignoring sequence lengths in batch
- Truncating sequences losing data
