Practice

(1/5)

1. What is the main reason for using checkpointing during PyTorch model training?

easy

A. To save the model's current state so training can resume later without loss

B. To speed up the training by skipping some layers

C. To reduce the size of the training dataset

D. To automatically tune hyperparameters during training

Solution

Step 1: Understand checkpointing purpose
Checkpointing saves the model's current state including weights and optimizer info.
Step 2: Connect checkpointing to training progress
This allows training to stop and resume later without losing progress.
Final Answer:
To save the model's current state so training can resume later without loss -> Option A
Quick Check:
Checkpointing = Save progress [OK]

Hint: Checkpointing means saving progress to continue later [OK]

Common Mistakes:

Thinking checkpointing speeds up training
Confusing checkpointing with data reduction
Assuming checkpointing tunes hyperparameters

2. Which of the following is the correct PyTorch code snippet to save a checkpoint?

easy

A. model.load_state_dict(torch.save('checkpoint.pth'))

B. torch.save(model.state_dict(), 'checkpoint.pth')

C. torch.load('checkpoint.pth')

D. optimizer.save('checkpoint.pth')

Solution

Step 1: Identify saving function
torch.save() is used to save objects like model weights to a file.
Step 2: Check correct usage for saving model state
model.state_dict() returns model weights; saving it with torch.save() is correct.
Final Answer:
torch.save(model.state_dict(), 'checkpoint.pth') -> Option B
Quick Check:
Save model weights = torch.save(state_dict) [OK]

Hint: Use torch.save with model.state_dict() to save checkpoint [OK]

Common Mistakes:

Using torch.load instead of torch.save to save
Trying to save optimizer with wrong method
Confusing load_state_dict with saving

3. Given this code snippet, what will be printed after loading the checkpoint?

model = MyModel()
optimizer = torch.optim.Adam(model.parameters())
checkpoint = torch.load('checkpoint.pth')
model.load_state_dict(checkpoint['model_state'])
optimizer.load_state_dict(checkpoint['optimizer_state'])
epoch = checkpoint['epoch']
print(epoch)

medium

A. An error because checkpoint keys are missing

B. The total number of model parameters

C. The optimizer learning rate

D. The epoch number saved in the checkpoint

Solution

Step 1: Understand checkpoint contents
The checkpoint dictionary contains keys 'model_state', 'optimizer_state', and 'epoch'.
Step 2: Identify printed value
Variable 'epoch' is assigned checkpoint['epoch'], so print(epoch) outputs the saved epoch number.
Final Answer:
The epoch number saved in the checkpoint -> Option D
Quick Check:
Print epoch from checkpoint = epoch number [OK]

Hint: Print shows saved epoch from checkpoint dictionary [OK]

Common Mistakes:

Thinking print shows model parameters count
Confusing optimizer state with epoch
Assuming missing keys cause error here

4. You tried to resume training but got an error: RuntimeError: Error(s) in loading state_dict. What is the most likely cause related to checkpointing?

medium

A. The training data was modified after checkpointing

B. The checkpoint file was saved with torch.load instead of torch.save

C. The model architecture changed after saving the checkpoint

D. The optimizer state was not saved in the checkpoint

Solution

Step 1: Understand error meaning
Loading state_dict errors usually happen if model layers differ from saved checkpoint.
Step 2: Connect error to checkpoint cause
If model architecture changed after saving, weights won't match, causing this error.
Final Answer:
The model architecture changed after saving the checkpoint -> Option C
Quick Check:
State_dict error = architecture mismatch [OK]

Hint: Mismatch model layers cause state_dict loading errors [OK]

Common Mistakes:

Confusing save/load functions causing error
Assuming missing optimizer state causes this error
Blaming training data changes for state_dict error

5. You want to checkpoint your training every 5 epochs to avoid losing progress. Which approach best preserves training progress including optimizer state and epoch count?

hard

A. Save a dictionary with model.state_dict(), optimizer.state_dict(), and current epoch number

B. Save only model.state_dict() every 5 epochs

C. Save optimizer.state_dict() and epoch number but not model weights

D. Save the training data batch every 5 epochs

Solution

Step 1: Identify what preserves full training state
Saving model weights, optimizer state, and epoch number allows full resume.
Step 2: Compare options
Only saving model weights misses optimizer info; saving optimizer and epoch without model is incomplete; saving data batch doesn't preserve progress.
Final Answer:
Save a dictionary with model.state_dict(), optimizer.state_dict(), and current epoch number -> Option A
Quick Check:
Checkpoint = model + optimizer + epoch [OK]

Hint: Checkpoint all: model, optimizer, and epoch for full resume [OK]

Common Mistakes:

Saving only model weights loses optimizer progress
Ignoring epoch number causes restart from zero
Saving training data batch does not preserve model state

Epoch	Loss ↓	Accuracy ↑	Observation
1	0.65	0.60	Training started, loss high, accuracy low
2	0.45	0.75	Checkpoint saved, loss decreased, accuracy improved
3	0.35	0.82	Training continues, better performance
4	0.30	0.85	Checkpoint saved, loss lower, accuracy higher

Why checkpointing preserves progress in PyTorch - Model Pipeline Impact

Start learning this pattern below

Practice

Solution

Step 1: Understand checkpointing purpose

Step 2: Connect checkpointing to training progress

Final Answer:

Quick Check:

Solution

Step 1: Identify saving function

Step 2: Check correct usage for saving model state

Final Answer:

Quick Check:

Solution

Step 1: Understand checkpoint contents

Step 2: Identify printed value

Final Answer:

Quick Check:

Solution

Step 1: Understand error meaning

Step 2: Connect error to checkpoint cause

Final Answer:

Quick Check:

Solution

Step 1: Identify what preserves full training state

Step 2: Compare options

Final Answer:

Quick Check: