0
0
TensorFlowml~20 mins

Batching and shuffling in TensorFlow - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Batching and Shuffling Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output shape after batching?
Given a TensorFlow dataset of 100 samples, each sample is a vector of length 10. If we batch the dataset with batch size 20, what will be the shape of one batch?
TensorFlow
import tensorflow as tf

# Create dataset of 100 samples, each sample shape (10,)
dataset = tf.data.Dataset.from_tensor_slices(tf.random.uniform([100, 10]))

# Batch with size 20
batched_dataset = dataset.batch(20)

for batch in batched_dataset.take(1):
    print(batch.shape)
A(5, 10)
B(10, 20)
C(100, 10)
D(20, 10)
Attempts:
2 left
💡 Hint
Batching groups samples along the first dimension.
🧠 Conceptual
intermediate
1:30remaining
Why shuffle a dataset before training?
What is the main reason to shuffle a dataset before training a machine learning model?
ATo ensure the model sees data in a random order, preventing learning bias from data order
BTo reduce the dataset size by removing duplicates
CTo increase the batch size automatically
DTo normalize the input features
Attempts:
2 left
💡 Hint
Think about how data order might affect learning.
Hyperparameter
advanced
2:00remaining
Choosing shuffle buffer size
In TensorFlow, the shuffle() method requires a buffer size parameter. What is the effect of increasing the shuffle buffer size?
AIt decreases randomness but speeds up training
BIt increases randomness of shuffling but uses more memory
CIt changes the batch size automatically
DIt normalizes the dataset features
Attempts:
2 left
💡 Hint
Think about how buffer size affects the number of samples held before shuffling.
🔧 Debug
advanced
2:30remaining
Why does this TensorFlow dataset not shuffle properly?
Consider this code snippet: import tensorflow as tf raw_data = tf.data.Dataset.range(10) shuffled_data = raw_data.shuffle(buffer_size=5) for item in shuffled_data: print(item.numpy()) Why might the output order not be fully randomized?
ABecause the shuffle buffer size is smaller than the dataset size, limiting randomness
BBecause shuffle() requires batch() to work properly
CBecause the dataset is not batched before shuffling
DBecause shuffle() only works on datasets with more than 100 samples
Attempts:
2 left
💡 Hint
Think about how shuffle buffer size relates to dataset size.
Model Choice
expert
3:00remaining
Best practice for shuffling and batching in TensorFlow pipeline
You want to prepare a TensorFlow dataset for training a neural network. Which pipeline order is best for performance and correctness?
AShuffle and batch simultaneously using batch(shuffle=True)
BBatch the dataset first, then shuffle the batches
CShuffle the dataset first, then batch it
DNeither shuffle nor batch the dataset
Attempts:
2 left
💡 Hint
Consider how shuffling affects individual samples versus batches.