PyTorchml~20 mins

Detaching from computation graph in PyTorch - ML Experiment: Train & Evaluate

Choose your learning style9 modes available

Learn Why Deep Model Try Challenge Experiment Recall Metrics

Experiment - Detaching from computation graph

Problem:You have a PyTorch model training loop where you want to use intermediate tensor values for logging or further calculations without affecting the gradient computation.

Current Metrics:Training loss decreases smoothly, but memory usage is high and training slows down over time.

Issue:Not detaching intermediate tensors causes PyTorch to keep the entire computation graph, leading to high memory use and slower training.

Your Task

Modify the training loop to detach intermediate tensors from the computation graph to reduce memory usage and maintain training speed without affecting model accuracy.

Do not change the model architecture.

Keep the training loss and accuracy calculation unchanged.

Only modify how intermediate tensors are handled.

Hint 1

Hint 2

Hint 3

Solution

PyTorch

import torch
import torch.nn as nn
import torch.optim as optim

# Simple model
class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 1)
    def forward(self, x):
        return self.linear(x)

# Data
X = torch.randn(100, 10)
y = torch.randn(100, 1)

model = SimpleModel()
criterion = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

for epoch in range(5):
    optimizer.zero_grad()
    output = model(X)
    # Detach output for logging without affecting gradients
    output_detached = output.detach()
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()
    print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}, Detached output mean: {output_detached.mean().item():.4f}")

Added .detach() to the model output tensor before using it for logging.

Kept the loss and backward pass using the original output tensor to preserve gradient flow.

This reduces memory usage by not tracking operations on the detached tensor.

Results Interpretation

Before: Training loss decreased but memory usage was high and training slowed down over epochs.

After: Training loss decreased similarly, but memory usage dropped and training speed improved.

Detaching tensors from the computation graph prevents PyTorch from tracking operations on them, saving memory and improving training efficiency without affecting gradient updates.

Bonus Experiment

Try detaching intermediate tensors inside the model's forward method and observe the effect on training and gradients.

💡 Hint

Detach only tensors used for logging or auxiliary calculations, not those needed for loss computation.