Overview - requires_grad flag

What is it?

The requires_grad flag in PyTorch is a setting on tensors that tells the system whether to track operations on them for automatic differentiation. When set to True, PyTorch records all operations on the tensor so it can compute gradients later, which are essential for training models. If set to False, the tensor is treated as a constant, and no gradients are computed for it. This flag helps control which parts of a model learn and update during training.

Why it matters

Without the requires_grad flag, PyTorch wouldn't know which tensors need gradients for learning. This would make training neural networks impossible or inefficient because the system would either waste time computing unnecessary gradients or fail to update parameters. It allows precise control over learning, saving memory and computation, and enabling techniques like freezing parts of a model or working with fixed inputs.

Where it fits

Before learning about requires_grad, you should understand tensors and basic PyTorch operations. After this, you will learn about backpropagation, optimizers, and how gradients update model parameters during training.

Mental Model

Core Idea

The requires_grad flag tells PyTorch which tensors to watch so it can calculate how changing them affects the final result.

Think of it like...

It's like marking certain ingredients in a recipe to track how changing their amounts affects the taste, while ignoring others that stay fixed.

Tensor (requires_grad=True) ──▶ Track operations ──▶ Build computation graph ──▶ Compute gradients during backward()
Tensor (requires_grad=False) ──▶ No tracking ──▶ Treated as constant

Build-Up - 7 Steps

1

FoundationWhat is requires_grad flag

Concept: Introduces the requires_grad flag as a property of tensors that controls gradient tracking.

In PyTorch, every tensor has a requires_grad attribute. By default, it is False. When you create a tensor with requires_grad=True, PyTorch starts tracking all operations on it to compute gradients later. For example: import torch x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True) This tensor x will now record operations for gradient calculation.

Result

The tensor x is now set to track operations for gradients.

Understanding that requires_grad controls whether PyTorch tracks operations is the foundation for learning how automatic differentiation works.

2

FoundationWhy gradients need tracking

3

IntermediateChanging requires_grad after creation

4

Intermediaterequires_grad and model parameters

5

Intermediaterequires_grad and no_grad context

6

AdvancedEffect of requires_grad on memory and speed

7

ExpertSubtleties with requires_grad and detach()

Under the Hood

PyTorch builds a dynamic computation graph during the forward pass by recording operations on tensors with requires_grad=True. Each tensor stores a reference to a Function object that created it, forming a graph of operations. When backward() is called, PyTorch traverses this graph in reverse order, applying the chain rule to compute gradients for each tensor. Tensors with requires_grad=False are treated as constants and do not create nodes in the graph, so no gradients flow through them.

Why designed this way?

PyTorch uses dynamic graphs to allow flexible model definitions and easy debugging. The requires_grad flag lets users control which tensors participate in gradient computation, optimizing memory and computation. Alternatives like static graphs (used in other frameworks) require full graph definition before execution, limiting flexibility. The design balances ease of use, performance, and flexibility.

Input tensors (requires_grad=True) ──▶ Operations ──▶ Computation graph nodes
                      │
                      ▼
            Backward pass computes gradients
                      │
                      ▼
           Gradients stored in .grad attributes

Tensors with requires_grad=False ──▶ No graph nodes ──▶ No gradients computed

Myth Busters - 4 Common Misconceptions

Quick: If a tensor has requires_grad=False, can it ever get gradients during backward()? Commit to yes or no.

Common Belief:If requires_grad=False, the tensor can still get gradients if used in computations.

Tap to reveal reality

Quick: Does setting requires_grad=True on a tensor automatically make it a model parameter? Commit to yes or no.

Common Belief:Setting requires_grad=True makes a tensor a model parameter that updates during training.

Tap to reveal reality

Quick: Does torch.no_grad() permanently change requires_grad flags on tensors? Commit to yes or no.

Common Belief:Using torch.no_grad() changes requires_grad flags permanently to False.

Tap to reveal reality

Quick: Does detach() create a copy of the tensor data? Commit to yes or no.

Common Belief:detach() creates a new copy of the tensor data without gradient tracking.

Tap to reveal reality

Expert Zone

1

requires_grad=True on leaf tensors is necessary for gradients to be stored in .grad after backward(), but intermediate tensors also track gradients without storing .grad.

2

Changing requires_grad on a tensor that is part of a computation graph can cause errors or unexpected behavior; it's safest to set requires_grad before graph construction.

3

Using requires_grad=False on parameters during fine-tuning can save memory and speed up training, but forgetting to re-enable it when needed can silently break learning.

When NOT to use

Do not use requires_grad=True on inputs or data tensors during inference or evaluation; instead, use torch.no_grad() to save resources. For fixed embeddings or frozen layers, set requires_grad=False to prevent unnecessary gradient computation. Alternatives include using nn.Parameter for trainable parameters and detach() to stop gradient flow selectively.

Production Patterns

In production, requires_grad is often set to False during model evaluation to improve speed and reduce memory. Transfer learning workflows freeze pretrained layers by setting requires_grad=False, then unfreeze selectively for fine-tuning. Custom training loops carefully toggle requires_grad to implement techniques like gradient checkpointing or mixed precision training.

Connections

Automatic Differentiation

requires_grad is the switch that enables automatic differentiation in PyTorch.

Understanding requires_grad clarifies how automatic differentiation selectively tracks computations for gradient calculation.

Transfer Learning

requires_grad controls which model layers learn during transfer learning by freezing or unfreezing parameters.

Knowing requires_grad helps implement transfer learning efficiently by freezing pretrained layers.

Spreadsheet Cell Dependencies

Both track dependencies to update outputs when inputs change.

Like spreadsheet cells recalculating when inputs change, requires_grad tracks tensor operations to compute gradients, showing a shared pattern of dependency tracking.

Common Pitfalls

#1Expecting gradients on tensors with requires_grad=False.

Wrong approach:x = torch.tensor([1.0, 2.0, 3.0]) y = x * 2 y.backward(torch.ones_like(y)) print(x.grad) # None

Correct approach:x = torch.tensor([1.0, 2.0, 3.0], requires_grad=True) y = x * 2 y.backward(torch.ones_like(y)) print(x.grad) # tensor([2., 2., 2.])

Root cause:Not setting requires_grad=True means PyTorch does not track operations or compute gradients.

#2Modifying requires_grad after graph creation causing errors.

Wrong approach:x = torch.tensor([1.0, 2.0], requires_grad=True) y = x * 2 x.requires_grad_(False) # Changing after graph built z = y * 3 z.backward() # RuntimeError

Correct approach:x = torch.tensor([1.0, 2.0], requires_grad=False) x.requires_grad_(True) # Set before graph or Create new tensor with requires_grad=True before graph construction.

Root cause:Changing requires_grad on tensors involved in a graph breaks gradient tracking consistency.

#3Using torch.no_grad() expecting permanent requires_grad change.

Wrong approach:with torch.no_grad(): x = torch.tensor([1.0, 2.0], requires_grad=True) print(x.requires_grad) # True outside block

Correct approach:x = torch.tensor([1.0, 2.0], requires_grad=False) # Set explicitly if no gradients needed

Root cause:Misunderstanding that no_grad only temporarily disables tracking, not changes requires_grad flag.

Key Takeaways

The requires_grad flag controls whether PyTorch tracks operations on tensors for gradient computation.

Setting requires_grad=True is essential for tensors that need to learn during training, like model parameters.

Changing requires_grad after building a computation graph can cause errors; set it before graph construction.

Using torch.no_grad() temporarily disables gradient tracking without changing requires_grad permanently.

Understanding requires_grad helps optimize training, inference, and advanced techniques like freezing layers or selective gradient flow.