PyTorchml~20 mins

Bidirectional RNNs in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style10 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Experiment - Bidirectional RNNs

Problem:We want to classify sequences of words into categories using a Recurrent Neural Network (RNN). The current model uses a simple unidirectional RNN.

Current Metrics:Training accuracy: 95%, Validation accuracy: 78%, Training loss: 0.15, Validation loss: 0.45

Issue:The model overfits: training accuracy is very high but validation accuracy is much lower, showing poor generalization.

Your Task

Reduce overfitting and improve validation accuracy to at least 85% while keeping training accuracy below 92%.

You must keep the RNN architecture but can change it to bidirectional.

You can add dropout layers but cannot increase the model size drastically.

Use the same dataset and training procedure.

Hint 1

Hint 2

Hint 3

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

class BiRNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, dropout=0.3):
        super().__init__()
        self.rnn = nn.RNN(input_size, hidden_size, batch_first=True, bidirectional=True)
        self.dropout = nn.Dropout(dropout)
        self.fc = nn.Linear(hidden_size * 2, output_size)  # times 2 for bidirectional

    def forward(self, x):
        out, _ = self.rnn(x)
        out = self.dropout(out[:, -1, :])  # use last time step output
        out = self.fc(out)
        return out

# Example training loop setup (simplified)
input_size = 50  # e.g., word embedding size
hidden_size = 64
output_size = 5  # number of classes

model = BiRNN(input_size, hidden_size, output_size)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Assume X_train, y_train, X_val, y_val are tensors
# Training loop (simplified)
for epoch in range(20):
    model.train()
    optimizer.zero_grad()
    outputs = model(X_train)
    loss = criterion(outputs, y_train)
    loss.backward()
    optimizer.step()

    model.eval()
    with torch.no_grad():
        val_outputs = model(X_val)
        val_loss = criterion(val_outputs, y_val)

# After training, calculate accuracies
# (Assume functions calculate_accuracy exist)
# training_accuracy = calculate_accuracy(model, X_train, y_train)
# validation_accuracy = calculate_accuracy(model, X_val, y_val)

# Expected improved metrics shown below

Changed the RNN to a bidirectional RNN to capture context from both directions.

Added a dropout layer after the RNN output to reduce overfitting.

Kept the hidden size moderate to avoid increasing model complexity too much.

Results Interpretation

Before: Training accuracy 95%, Validation accuracy 78%, Losses 0.15 / 0.45

After: Training accuracy 90%, Validation accuracy 87%, Losses 0.25 / 0.35

Using bidirectional RNNs helps the model understand sequences better by reading them both ways. Adding dropout reduces overfitting, improving validation accuracy while slightly lowering training accuracy.

Bonus Experiment

Try adding a second bidirectional RNN layer stacked on top of the first one and observe the effect on validation accuracy.

💡 Hint

Stacking layers can increase model capacity but may also increase overfitting. Use dropout and early stopping to control this.

Practice

(1/5)

1. What is the main advantage of using a bidirectional RNN compared to a standard RNN?

easy

A. It processes the input sequence in both forward and backward directions to capture full context.

B. It uses fewer parameters to reduce model size.

C. It only processes sequences backward for faster training.

D. It replaces recurrent layers with convolutional layers.

Bidirectional RNNs in PyTorch - ML Experiment: Train & Evaluate

Start learning this pattern below

Practice

Solution

Step 1: Understand standard RNN processing

Step 2: Analyze bidirectional RNN behavior

Final Answer:

Quick Check:

Solution

Step 1: Recall PyTorch GRU parameters

Step 2: Identify correct syntax

Final Answer:

Quick Check:

Solution

Step 1: Understand output shape of bidirectional RNN

Step 2: Calculate output shape

Final Answer:

Quick Check:

Solution

Step 1: Check default input shape for PyTorch RNN

Step 2: Analyze given input shape

Final Answer:

Quick Check:

Solution

Step 1: Understand variable-length sequence handling

Step 2: Apply packing with bidirectional LSTM

Final Answer:

Quick Check: