Challenge - 5 Problems

🎖️

Custom Data Pipeline Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Why use custom data pipelines in PyTorch?

Why do we often create custom data pipelines when working with real-world data in PyTorch?

ABecause real data often needs special processing steps that built-in loaders don’t support.

BBecause PyTorch cannot load any data without custom pipelines.

CBecause real data is always clean and ready to use without changes.

DBecause custom pipelines make training models faster by default.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of a simple custom PyTorch Dataset

What will be the output of the following code snippet?

PyTorch

from torch.utils.data import Dataset

class MyDataset(Dataset):
    def __init__(self):
        self.data = [10, 20, 30]
    def __len__(self):
        return len(self.data)
    def __getitem__(self, idx):
        return self.data[idx] * 2

dataset = MyDataset()
print(dataset[1])

AIndexError

B40

C30

D20

Attempts:

2 left

❓ Model Choice

advanced

2:30remaining

Choosing a data pipeline for noisy real data

You have a dataset of images with varying sizes and some corrupted files. Which data pipeline approach best handles this real data scenario?

AUse a custom Dataset class that checks image validity and resizes images on the fly.

BUse the default ImageFolder loader without any changes.

CLoad all images into memory at once without preprocessing.

DSkip corrupted files manually and use a fixed-size batch loader.

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Effect of batch size in custom data pipelines

When using a custom data pipeline in PyTorch, how does increasing the batch size affect training?

AIt has no effect on training speed or memory usage.

BIt decreases memory usage and always improves model accuracy.

CIt increases memory usage and may speed up training but can reduce model generalization if too large.

DIt causes the model to train slower and use less memory.

Attempts:

2 left

🔧 Debug

expert

3:00remaining

Debugging a custom PyTorch DataLoader hang

Consider this custom Dataset and DataLoader code. The training hangs indefinitely. What is the most likely cause?

PyTorch

from torch.utils.data import Dataset, DataLoader

class HangDataset(Dataset):
    def __init__(self):
        self.data = list(range(5))
    def __len__(self):
        return len(self.data)
    def __getitem__(self, idx):
        while True:
            pass  # Infinite loop

dataset = HangDataset()
loader = DataLoader(dataset, batch_size=2, num_workers=2)
for batch in loader:
    print(batch)

AThe dataset length is zero, so DataLoader waits forever.

BThe DataLoader batch_size is too large for the dataset size.

CThe num_workers parameter must be zero to avoid hangs.

DThe __getitem__ method has an infinite loop causing the hang.

Attempts:

2 left