GRU helps computers understand text by remembering important words and forgetting less important ones. It makes reading and predicting text easier and faster.
GRU for text in NLP
Start learning this pattern below
Jump into concepts and practice - no test required
torch.nn.GRU(input_size, hidden_size, num_layers=1, batch_first=False, dropout=0, bidirectional=False)
input_size is the number of features in each input word vector.
hidden_size is how many features the GRU remembers at each step.
gru = torch.nn.GRU(input_size=10, hidden_size=20)
gru = torch.nn.GRU(input_size=50, hidden_size=100, num_layers=2, batch_first=True)
This code creates a small GRU to process two short sentences. Each word is a vector of 5 numbers. The GRU remembers 4 features at each step. We print the output and hidden states shapes and values.
import torch import torch.nn as nn # Sample text data: batch of 2 sentences, each with 3 words, each word represented by 5 features input_data = torch.randn(2, 3, 5) # batch_size=2, seq_len=3, input_size=5 # Create GRU: input_size=5, hidden_size=4, batch_first=True gru = nn.GRU(input_size=5, hidden_size=4, batch_first=True) # Forward pass output, hidden = gru(input_data) print('Output shape:', output.shape) print('Output:', output) print('Hidden shape:', hidden.shape) print('Hidden:', hidden)
GRU is faster and simpler than LSTM but still remembers important information.
Set batch_first=True if your input shape is (batch, sequence, features).
Hidden state shape is (num_layers * num_directions, batch, hidden_size).
GRU helps models remember important parts of text while ignoring less important parts.
Use GRU for tasks like text prediction, classification, and translation.
Input shape and hidden size must match your data and task needs.
Practice
Solution
Step 1: Understand GRU's role in memory
GRU units are designed to keep important information from previous steps and forget irrelevant data, helping with sequence tasks like text.Step 2: Compare options to GRU function
Only It helps the model remember important information over time while ignoring less important details. correctly describes this memory feature; others describe unrelated or incorrect functions.Final Answer:
It helps the model remember important information over time while ignoring less important details. -> Option AQuick Check:
GRU memory feature = A [OK]
- Thinking GRU changes input size
- Confusing GRU with data preprocessing
- Assuming GRU outputs images
Solution
Step 1: Recall PyTorch GRU parameters
PyTorch GRU expects input_size first (embedding size), then hidden_size (number of features in hidden state).Step 2: Match parameters to given sizes
Embedding size is 100, hidden size is 50, so nn.GRU(input_size=100, hidden_size=50) is correct.Final Answer:
nn.GRU(input_size=100, hidden_size=50) -> Option CQuick Check:
input_size=100, hidden_size=50 = B [OK]
- Swapping input_size and hidden_size
- Using positional args incorrectly
- Omitting required parameters
import torch import torch.nn as nn gru = nn.GRU(input_size=10, hidden_size=20, batch_first=True) input = torch.randn(5, 7, 10) # batch=5, seq_len=7, input_size=10 output, hidden = gru(input) print(output.shape)
Solution
Step 1: Understand GRU output shape with batch_first=true
Output shape is (batch_size, sequence_length, hidden_size) when batch_first=true.Step 2: Match given input sizes
Input batch=5, seq_len=7, hidden_size=20, so output shape is (5, 7, 20).Final Answer:
(5, 7, 20) -> Option BQuick Check:
Output shape = (batch, seq_len, hidden_size) = A [OK]
- Confusing batch and sequence dimensions
- Ignoring batch_first=true effect
- Assuming output shape equals input shape
gru = nn.GRU(input_size=50, hidden_size=100) input = torch.randn(32, 10, 100) # batch=32, seq_len=10, input_size=100 output, hidden = gru(input)What is the likely cause of the error?
Solution
Step 1: Check GRU input_size vs input tensor last dimension
GRU expects input_size=50, but input tensor last dimension is 100, causing mismatch.Step 2: Understand tensor shape requirements
GRU input shape should be (batch, seq_len, input_size). Here input_size dimension must match GRU's input_size parameter.Final Answer:
Input size 100 does not match GRU input_size 50 -> Option AQuick Check:
Input size mismatch = C [OK]
- Blaming batch size for error
- Thinking sequence length is invalid
- Assuming GRU only accepts 2D input
Solution
Step 1: Understand variable-length sequence handling
GRU requires fixed-length inputs or packed sequences to handle variable lengths efficiently.Step 2: Use padding and packing for variable-length inputs
Padding sequences to max length and using pack_padded_sequence lets GRU ignore padded parts during processing.Final Answer:
Pad all sequences to the same length and use pack_padded_sequence before GRU. -> Option DQuick Check:
Padding + pack_padded_sequence = D [OK]
- Truncating sequences too short loses info
- Feeding raw variable-length sequences causes errors
- Switching to CNN ignores GRU benefits
