Challenge - 5 Problems

🎖️

Multi-GPU Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of PyTorch DataParallel model forward pass

Consider the following PyTorch code snippet using DataParallel for multi-GPU training. What will be the output shape of output after the forward pass?

PyTorch

import torch
import torch.nn as nn

class SimpleModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(10, 5)

    def forward(self, x):
        return self.linear(x)

model = SimpleModel()
if torch.cuda.device_count() > 1:
    model = nn.DataParallel(model)

model = model.cuda()
input_tensor = torch.randn(16, 10).cuda()
output = model(input_tensor)
print(output.shape)

Atorch.Size([8, 5])

Btorch.Size([32, 5])

CRuntimeError due to input size mismatch

Dtorch.Size([16, 5])

Attempts:

2 left

🧠 Conceptual

intermediate

2:00remaining

Understanding DistributedDataParallel vs DataParallel

Which statement correctly describes a key difference between PyTorch's DistributedDataParallel (DDP) and DataParallel (DP) for multi-GPU training?

ADP requires manual gradient synchronization, but DDP automatically splits input batches across GPUs.

BDDP runs model replicas in separate processes and synchronizes gradients, while DP runs replicas in a single process splitting input batches.

CDDP only works on a single GPU, while DP supports multiple GPUs on one machine.

DDP uses multiple machines for training, while DDP is limited to one machine.

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Choosing batch size for multi-GPU training

You want to train a model on 4 GPUs using DistributedDataParallel. Your single-GPU batch size is 32. What is the best practice for setting the batch size when using 4 GPUs?

AKeep batch size 32 and let each GPU process the full batch independently.

BSet batch size to 8 so total batch size is 8 across all GPUs.

CSet batch size to 128 so each GPU processes 32 samples (total batch size = 4 * 32).

DSet batch size to 32 and manually average gradients across GPUs.

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identifying error in multi-GPU model saving

You trained a model using nn.DataParallel and saved it with torch.save(model.state_dict(), 'model.pth'). When loading the model later without DataParallel, you get a key mismatch error. What is the cause?

AThe saved state dict keys are prefixed with 'module.' due to DataParallel wrapping.

BThe model was saved on CPU but loaded on GPU causing mismatch.

CThe model architecture changed between saving and loading causing mismatch.

DThe saved file is corrupted and cannot be loaded properly.

Attempts:

2 left

❓ Metrics

expert

2:00remaining

Interpreting training speedup with multi-GPU training

You train a model on 1 GPU and it takes 100 minutes per epoch. Using 4 GPUs with DistributedDataParallel, the epoch time reduces to 30 minutes. Which statement best explains this result?

AThe speedup is less than 4x due to communication overhead and synchronization costs.

BThe speedup is exactly 4x because 4 GPUs process data independently without overhead.

CThe speedup is more than 4x because GPUs work better together than alone.

DThe speedup is unrelated to GPUs and caused by random fluctuations.

Attempts:

2 left