Bird
Raised Fist0
PyTorchml~20 mins

nn.GRU layer in PyTorch - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
GRU Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output shape of nn.GRU layer
Given the following PyTorch code, what is the shape of the output tensor out after running the GRU layer?
PyTorch
import torch
import torch.nn as nn

gru = nn.GRU(input_size=10, hidden_size=20, num_layers=2)
input_tensor = torch.randn(5, 3, 10)  # (seq_len, batch, input_size)
out, h_n = gru(input_tensor)
print(out.shape)
Atorch.Size([3, 20, 5])
Btorch.Size([3, 5, 20])
Ctorch.Size([5, 20, 3])
Dtorch.Size([5, 3, 20])
Attempts:
2 left
💡 Hint
Remember the GRU output shape is (seq_len, batch, hidden_size).
Model Choice
intermediate
2:00remaining
Choosing GRU for sequence data
You want to build a model to predict the next word in a sentence using sequential text data. Which of the following is the best reason to choose an nn.GRU layer over a simple RNN layer?
AGRU can better capture long-term dependencies with fewer parameters than a simple RNN.
BGRU layers are faster to train because they use convolution operations.
CSimple RNNs have built-in attention mechanisms, but GRUs do not.
DGRU layers require the input to be one-hot encoded, unlike simple RNNs.
Attempts:
2 left
💡 Hint
Think about how GRUs handle memory compared to simple RNNs.
Hyperparameter
advanced
2:00remaining
Effect of num_layers in nn.GRU
What is the effect of increasing the num_layers parameter in an nn.GRU layer from 1 to 3?
AThe GRU will have 3 stacked layers, allowing it to learn more complex features from the sequence.
BThe GRU will process the input sequence 3 times in parallel and average the results.
CThe GRU will increase the hidden size by 3 times automatically.
DThe GRU will reduce the sequence length by a factor of 3.
Attempts:
2 left
💡 Hint
Think about what stacking layers means in neural networks.
Metrics
advanced
2:00remaining
Interpreting GRU training loss
During training of a GRU-based model for time series prediction, the training loss decreases steadily but the validation loss starts increasing after some epochs. What does this indicate?
AThe GRU layer is not suitable for time series data.
BThe model is underfitting and needs more training epochs.
CThe model is overfitting the training data and not generalizing well.
DThe learning rate is too low, causing slow convergence.
Attempts:
2 left
💡 Hint
Think about what it means when validation loss rises but training loss falls.
🔧 Debug
expert
2:00remaining
Identifying error in GRU input shape
You run this code and get a runtime error. What is the cause?
import torch
import torch.nn as nn

gru = nn.GRU(input_size=8, hidden_size=16)
input_tensor = torch.randn(4, 10)  # Missing batch dimension
out, h_n = gru(input_tensor)
AHidden size must be equal to input size, but here they differ.
BInput tensor must be 3D with shape (seq_len, batch, input_size), but input_tensor is 2D.
CGRU requires input tensor to be on GPU but input_tensor is on CPU.
DThe batch size must be the first dimension, but here it is the second.
Attempts:
2 left
💡 Hint
Check the expected input shape for nn.GRU layers.

Practice

(1/5)
1. What is the primary purpose of the nn.GRU layer in PyTorch?
easy
A. To reduce the dimensionality of data using PCA
B. To perform image classification using convolution
C. To process sequential data by remembering past information
D. To generate random numbers for initialization

Solution

  1. Step 1: Understand the role of GRU

    The GRU (Gated Recurrent Unit) is designed to handle sequences by keeping track of past inputs, which helps in tasks like text or speech processing.
  2. Step 2: Compare with other options

    The other options describe unrelated tasks: dimensionality reduction using PCA, image classification using convolution, and random number generation, which are not the purpose of GRU.
  3. Final Answer:

    To process sequential data by remembering past information -> Option C
  4. Quick Check:

    GRU = sequence memory [OK]
Hint: GRU remembers past inputs in sequences [OK]
Common Mistakes:
  • Confusing GRU with convolution layers
  • Thinking GRU reduces data dimensions like PCA
  • Assuming GRU generates random values
2. Which of the following is the correct way to create a GRU layer with input size 10 and hidden size 20 in PyTorch?
easy
A. nn.GRU(20, 10)
B. nn.GRU(input_size=10, hidden_size=20)
C. nn.GRU(hidden_size=10, input_size=20)
D. nn.GRU(10)

Solution

  1. Step 1: Recall GRU constructor parameters

    The correct order and names are input_size first, then hidden_size. So nn.GRU(input_size=10, hidden_size=20) is correct.
  2. Step 2: Check other options

    nn.GRU(20, 10) reverses the sizes. nn.GRU(hidden_size=10, input_size=20) swaps parameter names incorrectly. nn.GRU(10) misses the hidden size parameter.
  3. Final Answer:

    nn.GRU(input_size=10, hidden_size=20) -> Option B
  4. Quick Check:

    Input size first, hidden size second [OK]
Hint: Remember: input_size before hidden_size in nn.GRU [OK]
Common Mistakes:
  • Swapping input_size and hidden_size
  • Omitting hidden_size parameter
  • Using wrong parameter names
3. Given the following code, what is the shape of the output tensor out?
import torch
import torch.nn as nn

gru = nn.GRU(input_size=5, hidden_size=3, batch_first=True)
x = torch.randn(4, 7, 5)  # batch=4, seq_len=7, input_size=5
out, h_n = gru(x)
print(out.shape)
medium
A. (4, 7, 3)
B. (7, 4, 3)
C. (4, 3, 7)
D. (7, 3, 4)

Solution

  1. Step 1: Understand batch_first=True effect

    With batch_first=True, input shape is (batch, seq_len, input_size). Output shape matches (batch, seq_len, hidden_size).
  2. Step 2: Apply shapes from code

    Input is (4, 7, 5), hidden_size=3, so output out shape is (4, 7, 3).
  3. Final Answer:

    (4, 7, 3) -> Option A
  4. Quick Check:

    Output shape = (batch, seq_len, hidden_size) [OK]
Hint: batch_first=True means batch is first dimension [OK]
Common Mistakes:
  • Confusing batch and sequence dimensions
  • Ignoring batch_first parameter
  • Mixing hidden_size with input_size
4. Which of the following correctly describes the execution of this code snippet?
import torch
import torch.nn as nn

gru = nn.GRU(input_size=8, hidden_size=4)
x = torch.randn(5, 10, 8)
out, h = gru(x)
print(out.shape)
medium
A. The code runs without errors and prints (5, 10, 4)
B. The hidden_size must be larger than input_size
C. The GRU layer requires batch_first=True for this input shape
D. The input tensor shape is incorrect for default GRU settings

Solution

  1. Step 1: Check default GRU input expectations

    By default, GRU expects input shape (seq_len, batch, input_size). Here, input is (5, 10, 8), so seq_len=5, batch=10, input_size=8 which matches default.
  2. Step 2: Verify output shape

    Output shape will be (seq_len, batch, hidden_size) = (5, 10, 4).
  3. Step 3: Evaluate statements

    The code runs without errors and prints (5, 10, 4). Hidden_size can be smaller than input_size. batch_first=True is not required. Input shape is correct for default settings.
  4. Final Answer:

    The code runs without errors and prints (5, 10, 4) -> Option A
  5. Quick Check:

    Default GRU input shape = (seq_len, batch, input_size) [OK]
Hint: Default GRU expects seq_len first, batch second [OK]
Common Mistakes:
  • Assuming batch is first dimension without batch_first=True
  • Thinking hidden_size must be bigger than input_size
  • Expecting output shape to swap batch and seq_len
5. You want to build a GRU-based model to process variable-length sequences in a batch. Which approach correctly handles this in PyTorch?
hard
A. Feed raw variable-length sequences directly to nn.GRU without padding
B. Manually truncate all sequences to the shortest length before input
C. Use nn.GRU with batch_first=False and ignore sequence lengths
D. Pad sequences to the same length and use pack_padded_sequence before feeding to nn.GRU

Solution

  1. Step 1: Understand variable-length sequence handling

    PyTorch requires sequences in a batch to be the same length or packed. Padding sequences and using pack_padded_sequence allows GRU to ignore padded parts.
  2. Step 2: Evaluate options

    Pad sequences to the same length and use pack_padded_sequence before feeding to nn.GRU correctly pads and packs sequences. Feed raw variable-length sequences directly to nn.GRU without padding is invalid because GRU cannot handle raw variable-length sequences. Use nn.GRU with batch_first=False and ignore sequence lengths ignores lengths, causing wrong results. Manually truncate all sequences to the shortest length before input loses data by truncation.
  3. Final Answer:

    Pad sequences and use pack_padded_sequence before nn.GRU -> Option D
  4. Quick Check:

    Use padding + packing for variable-length sequences [OK]
Hint: Pad then pack sequences before GRU [OK]
Common Mistakes:
  • Feeding variable-length sequences without padding
  • Ignoring sequence lengths in batch
  • Truncating sequences losing data