PytorchHow-ToBeginner · 3 min read

How to Use nn.GRU in PyTorch: Syntax and Example

In PyTorch, nn.GRU is used to create a gated recurrent unit layer for sequence data. You initialize it with input size and hidden size, then pass input tensors to get output and hidden states. It is useful for tasks like time series or text processing.

📐

Syntax

The nn.GRU constructor creates a GRU layer with these main parameters:

input_size: Number of expected features in the input.
hidden_size: Number of features in the hidden state.
num_layers: Number of stacked GRU layers (default 1).
batch_first: If True, input and output tensors have shape (batch, seq, feature).

When you call the GRU layer, it returns two tensors: output (all hidden states for each time step) and hidden (last hidden state for each layer).

python

import torch
import torch.nn as nn
gru = nn.GRU(input_size=10, hidden_size=20, num_layers=2, batch_first=True)
input_tensor = torch.randn(5, 3, 10)  # batch=5, seq_len=3, features=10
output, hidden = gru(input_tensor)

💻

Example

This example shows how to create a GRU, pass a random input tensor, and print the shapes of the output and hidden state tensors.

python

import torch
import torch.nn as nn

# Create GRU layer
input_size = 8
hidden_size = 16
num_layers = 1
batch_size = 4
seq_len = 5

gru = nn.GRU(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, batch_first=True)

# Random input: batch of 4 sequences, each of length 5, with 8 features
input_tensor = torch.randn(batch_size, seq_len, input_size)

# Forward pass
output, hidden = gru(input_tensor)

print('Output shape:', output.shape)  # (batch_size, seq_len, hidden_size)
print('Hidden shape:', hidden.shape)  # (num_layers, batch_size, hidden_size)

Output

Output shape: torch.Size([4, 5, 16]) Hidden shape: torch.Size([1, 4, 16])

⚠️

Common Pitfalls

Input shape mismatch: If batch_first=True, input must be (batch, seq_len, features). Otherwise, it should be (seq_len, batch, features).
Hidden state shape: The hidden state shape is (num_layers, batch, hidden_size), which can confuse beginners.
Forgetting to initialize hidden state: You can pass an initial hidden state, but if omitted, it defaults to zeros.
Using wrong input_size: The input feature size must match the GRU's input_size parameter.

python

import torch
import torch.nn as nn

gru = nn.GRU(input_size=10, hidden_size=20, batch_first=True)

# Wrong input shape (seq_len, batch, features) instead of (batch, seq_len, features)
wrong_input = torch.randn(3, 5, 10)

try:
    output, hidden = gru(wrong_input)
except RuntimeError as e:
    print('Error:', e)

# Correct input shape
correct_input = torch.randn(5, 3, 10)
output, hidden = gru(correct_input)
print('Output shape with correct input:', output.shape)

Output

Error: Expected batch_size at dimension 0 but got 3 Output shape with correct input: torch.Size([5, 3, 20])

📊

Quick Reference

Remember these key points when using nn.GRU:

Input shape: (batch, seq_len, input_size) if batch_first=True, else (seq_len, batch, input_size).
Output: All hidden states for each time step, shape (batch, seq_len, hidden_size) or (seq_len, batch, hidden_size).
Hidden: Last hidden state for each layer, shape (num_layers, batch, hidden_size).
Initial hidden state: Optional, defaults to zeros if not provided.

✅

Key Takeaways

Initialize nn.GRU with input_size and hidden_size matching your data features.

Input tensor shape depends on batch_first; usually (batch, seq_len, features) is easier.

GRU returns output for all time steps and the last hidden state separately.

Pass initial hidden state if you want to control it; otherwise, it defaults to zeros.

Check tensor shapes carefully to avoid runtime errors.