Challenge - 5 Problems
Batching and Shuffling Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output shape after batching?
Given a TensorFlow dataset of 100 samples, each sample is a vector of length 10. If we batch the dataset with batch size 20, what will be the shape of one batch?
TensorFlow
import tensorflow as tf # Create dataset of 100 samples, each sample shape (10,) dataset = tf.data.Dataset.from_tensor_slices(tf.random.uniform([100, 10])) # Batch with size 20 batched_dataset = dataset.batch(20) for batch in batched_dataset.take(1): print(batch.shape)
Attempts:
2 left
💡 Hint
Batching groups samples along the first dimension.
✗ Incorrect
Batching groups 20 samples together. Each sample has shape (10,), so the batch shape is (20, 10).
🧠 Conceptual
intermediate1:30remaining
Why shuffle a dataset before training?
What is the main reason to shuffle a dataset before training a machine learning model?
Attempts:
2 left
💡 Hint
Think about how data order might affect learning.
✗ Incorrect
Shuffling prevents the model from learning patterns based on the order of data, which could cause bias and reduce generalization.
❓ Hyperparameter
advanced2:00remaining
Choosing shuffle buffer size
In TensorFlow, the shuffle() method requires a buffer size parameter. What is the effect of increasing the shuffle buffer size?
Attempts:
2 left
💡 Hint
Think about how buffer size affects the number of samples held before shuffling.
✗ Incorrect
A larger buffer size means more samples are held in memory and shuffled, increasing randomness but using more RAM.
🔧 Debug
advanced2:30remaining
Why does this TensorFlow dataset not shuffle properly?
Consider this code snippet:
import tensorflow as tf
raw_data = tf.data.Dataset.range(10)
shuffled_data = raw_data.shuffle(buffer_size=5)
for item in shuffled_data:
print(item.numpy())
Why might the output order not be fully randomized?
Attempts:
2 left
💡 Hint
Think about how shuffle buffer size relates to dataset size.
✗ Incorrect
If buffer size is smaller than dataset size, only that many samples are shuffled at a time, so full randomization is not guaranteed.
❓ Model Choice
expert3:00remaining
Best practice for shuffling and batching in TensorFlow pipeline
You want to prepare a TensorFlow dataset for training a neural network. Which pipeline order is best for performance and correctness?
Attempts:
2 left
💡 Hint
Consider how shuffling affects individual samples versus batches.
✗ Incorrect
Shuffling before batching ensures samples are randomized before grouping, improving training quality.