Bird
Raised Fist0
NLPml~20 mins

GRU for text in NLP - Practice Problems & Coding Challenges

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Challenge - 5 Problems
🎖️
GRU Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Output of GRU layer with simple input
What is the shape of the output tensor after passing a batch of 2 sequences, each of length 3 with 4 features, through a GRU layer with 5 units and return_sequences=True?
NLP
import torch
import torch.nn as nn

gru = nn.GRU(input_size=4, hidden_size=5, batch_first=True, bidirectional=False)
input_tensor = torch.randn(2, 3, 4)  # batch=2, seq_len=3, features=4
output, hidden = gru(input_tensor)
print(output.shape)
Atorch.Size([3, 2, 5])
Btorch.Size([2, 3, 5])
Ctorch.Size([2, 5])
Dtorch.Size([2, 3, 4])
Attempts:
2 left
💡 Hint
Remember that batch_first=True means batch size is the first dimension in input and output.
🧠 Conceptual
intermediate
1:30remaining
Purpose of the reset gate in GRU
What is the main role of the reset gate in a GRU cell when processing text sequences?
AIt normalizes the input features before processing.
BIt controls how much of the previous hidden state to keep for the final output.
CIt initializes the hidden state at the start of the sequence.
DIt decides how much past information to forget before calculating the new candidate activation.
Attempts:
2 left
💡 Hint
Think about which gate helps the model reset memory for new inputs.
Hyperparameter
advanced
1:30remaining
Choosing GRU hidden size for text classification
You want to build a GRU-based text classifier. Which hidden size choice is most likely to balance model capacity and training speed for a medium-sized dataset?
AHidden size = 128
BHidden size = 1000
CHidden size = 1
DHidden size = 10
Attempts:
2 left
💡 Hint
Too small hidden size limits learning; too large slows training and risks overfitting.
Metrics
advanced
2:00remaining
Evaluating GRU model performance on text data
After training a GRU model for sentiment analysis, you get these results on the test set: accuracy=0.85, precision=0.60, recall=0.95. What does this tell you about the model's predictions?
AThe model rarely finds positive cases but is very precise when it does.
BThe model has balanced precision and recall.
CThe model finds most positive cases but also has many false positives.
DThe model is overfitting the training data.
Attempts:
2 left
💡 Hint
High recall but low precision means many false alarms.
🔧 Debug
expert
2:30remaining
Identifying error in GRU input shape
You run this PyTorch code and get a runtime error: import torch import torch.nn as nn gru = nn.GRU(input_size=10, hidden_size=20, batch_first=True) input_tensor = torch.randn(5, 20, 10) output, hidden = gru(input_tensor) What is the cause of the error?
NLP
import torch
import torch.nn as nn

gru = nn.GRU(input_size=10, hidden_size=20, batch_first=True)
input_tensor = torch.randn(5, 20, 10)
output, hidden = gru(input_tensor)
AThe input tensor shape is incorrect; batch size and sequence length are swapped.
BThe GRU's hidden size must be equal to input size.
CThe input tensor shape is correct; the error is due to missing initial hidden state.
DThe input tensor's sequence length dimension is 20, which is too large for the GRU.
Attempts:
2 left
💡 Hint
Check the meaning of each dimension when batch_first=True.

Practice

(1/5)
1. What is the main advantage of using a GRU (Gated Recurrent Unit) in text processing tasks?
easy
A. It helps the model remember important information over time while ignoring less important details.
B. It increases the size of the input text automatically.
C. It converts text into images for better analysis.
D. It removes all punctuation from the text before processing.

Solution

  1. Step 1: Understand GRU's role in memory

    GRU units are designed to keep important information from previous steps and forget irrelevant data, helping with sequence tasks like text.
  2. Step 2: Compare options to GRU function

    Only It helps the model remember important information over time while ignoring less important details. correctly describes this memory feature; others describe unrelated or incorrect functions.
  3. Final Answer:

    It helps the model remember important information over time while ignoring less important details. -> Option A
  4. Quick Check:

    GRU memory feature = A [OK]
Hint: GRU remembers key info, forgets noise in sequences [OK]
Common Mistakes:
  • Thinking GRU changes input size
  • Confusing GRU with data preprocessing
  • Assuming GRU outputs images
2. Which of the following is the correct way to define a GRU layer in Python using PyTorch for text input with embedding size 100 and hidden size 50?
easy
A. nn.GRU(hidden_size=100, input_size=50)
B. nn.GRU(50, 100)
C. nn.GRU(input_size=100, hidden_size=50)
D. nn.GRU(100)

Solution

  1. Step 1: Recall PyTorch GRU parameters

    PyTorch GRU expects input_size first (embedding size), then hidden_size (number of features in hidden state).
  2. Step 2: Match parameters to given sizes

    Embedding size is 100, hidden size is 50, so nn.GRU(input_size=100, hidden_size=50) is correct.
  3. Final Answer:

    nn.GRU(input_size=100, hidden_size=50) -> Option C
  4. Quick Check:

    input_size=100, hidden_size=50 = B [OK]
Hint: Input size first, hidden size second in nn.GRU() [OK]
Common Mistakes:
  • Swapping input_size and hidden_size
  • Using positional args incorrectly
  • Omitting required parameters
3. Given the following PyTorch code snippet, what will be the shape of the output tensor after passing input through the GRU?
import torch
import torch.nn as nn

gru = nn.GRU(input_size=10, hidden_size=20, batch_first=True)
input = torch.randn(5, 7, 10)  # batch=5, seq_len=7, input_size=10
output, hidden = gru(input)
print(output.shape)
medium
A. (7, 5, 20)
B. (5, 7, 20)
C. (5, 20, 7)
D. (5, 7, 10)

Solution

  1. Step 1: Understand GRU output shape with batch_first=true

    Output shape is (batch_size, sequence_length, hidden_size) when batch_first=true.
  2. Step 2: Match given input sizes

    Input batch=5, seq_len=7, hidden_size=20, so output shape is (5, 7, 20).
  3. Final Answer:

    (5, 7, 20) -> Option B
  4. Quick Check:

    Output shape = (batch, seq_len, hidden_size) = A [OK]
Hint: With batch_first=true, output shape is (batch, seq_len, hidden) [OK]
Common Mistakes:
  • Confusing batch and sequence dimensions
  • Ignoring batch_first=true effect
  • Assuming output shape equals input shape
4. You wrote this code to create a GRU for text classification but get a runtime error:
gru = nn.GRU(input_size=50, hidden_size=100)
input = torch.randn(32, 10, 100)  # batch=32, seq_len=10, input_size=100
output, hidden = gru(input)
What is the likely cause of the error?
medium
A. Input size 100 does not match GRU input_size 50
B. Batch size 32 is too large for GRU
C. Sequence length 10 is invalid for GRU
D. GRU requires input to be 2D tensor, not 3D

Solution

  1. Step 1: Check GRU input_size vs input tensor last dimension

    GRU expects input_size=50, but input tensor last dimension is 100, causing mismatch.
  2. Step 2: Understand tensor shape requirements

    GRU input shape should be (batch, seq_len, input_size). Here input_size dimension must match GRU's input_size parameter.
  3. Final Answer:

    Input size 100 does not match GRU input_size 50 -> Option A
  4. Quick Check:

    Input size mismatch = C [OK]
Hint: Match input tensor last dim to GRU input_size [OK]
Common Mistakes:
  • Blaming batch size for error
  • Thinking sequence length is invalid
  • Assuming GRU only accepts 2D input
5. You want to build a GRU-based model to classify movie reviews as positive or negative. Your dataset has variable-length reviews. Which approach best handles variable-length sequences with a GRU in PyTorch?
hard
A. Convert text to images and use CNN instead of GRU.
B. Truncate all sequences to length 1 and feed to GRU.
C. Feed raw sequences directly without padding or packing.
D. Pad all sequences to the same length and use pack_padded_sequence before GRU.

Solution

  1. Step 1: Understand variable-length sequence handling

    GRU requires fixed-length inputs or packed sequences to handle variable lengths efficiently.
  2. Step 2: Use padding and packing for variable-length inputs

    Padding sequences to max length and using pack_padded_sequence lets GRU ignore padded parts during processing.
  3. Final Answer:

    Pad all sequences to the same length and use pack_padded_sequence before GRU. -> Option D
  4. Quick Check:

    Padding + pack_padded_sequence = D [OK]
Hint: Pad sequences and pack before GRU for variable lengths [OK]
Common Mistakes:
  • Truncating sequences too short loses info
  • Feeding raw variable-length sequences causes errors
  • Switching to CNN ignores GRU benefits