Challenge - 5 Problems

🎖️

MixUp Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

1:30remaining

What is the main purpose of the MixUp strategy in training neural networks?

MixUp is a data augmentation technique used in training neural networks. What is its primary goal?

ATo reduce the model size by pruning neurons during training.

BTo increase the size of the training dataset by duplicating existing samples without modification.

CTo create new training samples by combining pairs of examples and their labels to improve model generalization.

DTo speed up training by using lower precision arithmetic.

Attempts:

2 left

❓ Predict Output

intermediate

1:30remaining

What is the output shape of mixed inputs using MixUp?

Given two batches of images X1 and X2 each with shape (32, 3, 64, 64), and a mixing coefficient lambda = 0.4, what is the shape of the mixed batch X_mix = lambda * X1 + (1 - lambda) * X2?

Computer Vision

import torch
X1 = torch.randn(32, 3, 64, 64)
X2 = torch.randn(32, 3, 64, 64)
lambda_ = 0.4
X_mix = lambda_ * X1 + (1 - lambda_) * X2
print(X_mix.shape)

A(32, 3)

B(64, 3, 64, 64)

C(32, 6, 64, 64)

D(32, 3, 64, 64)

Attempts:

2 left

❓ Hyperparameter

advanced

2:00remaining

Which hyperparameter controls the strength of mixing in MixUp?

In MixUp, a Beta distribution parameter alpha is used to sample the mixing coefficient lambda. What effect does increasing alpha have?

AIncreasing <code>alpha</code> makes <code>lambda</code> values closer to 0 or 1, resulting in weaker mixing.

BIncreasing <code>alpha</code> makes <code>lambda</code> values closer to 0.5, resulting in stronger mixing of samples.

CIncreasing <code>alpha</code> reduces the batch size during training.

DIncreasing <code>alpha</code> changes the learning rate dynamically.

Attempts:

2 left

❓ Metrics

advanced

2:00remaining

How does MixUp affect the training loss curve compared to standard training?

When training a neural network with MixUp, how does the training loss curve typically behave compared to training without MixUp?

AThe training loss decreases more slowly and is generally higher during training, but test accuracy improves.

BThe training loss decreases faster and reaches a lower final value than standard training.

CThe training loss oscillates wildly and never stabilizes.

DThe training loss is identical to standard training since MixUp only affects test data.

Attempts:

2 left

🔧 Debug

expert

2:30remaining

Why does this MixUp implementation cause a runtime error?

Consider this PyTorch MixUp code snippet:

def mixup_data(x, y, alpha=1.0):
    if alpha > 0:
        lam = np.random.beta(alpha, alpha)
    else:
        lam = 1
    batch_size = x.size()[0]
    index = torch.randperm(batch_size)
    mixed_x = lam * x + (1 - lam) * x[index, :]
    mixed_y = lam * y + (1 - lam) * y[index]
    return mixed_x, mixed_y

x = torch.randn(16, 3, 32, 32)
y = torch.randint(0, 10, (16,))
mixed_x, mixed_y = mixup_data(x, y, alpha=0.4)

What causes the runtime error?

ATrying to multiply integer labels <code>y</code> by float mixing coefficients causes a TypeError.

BThe <code>index</code> tensor is not on the same device as <code>x</code> causing a device mismatch error.

CThe Beta distribution sampling uses numpy but <code>lam</code> is not converted to a tensor before use.

DThe batch size is incorrectly computed causing an index out of range error.

Attempts:

2 left