0
0
TensorFlowml~20 mins

tf.data.Dataset creation in TensorFlow - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - tf.data.Dataset creation
Problem:You want to create a TensorFlow dataset from numpy arrays to feed data into a model. Currently, you have a dataset created but it is not shuffled or batched, causing slow training and poor model performance.
Current Metrics:Training accuracy: 75%, Validation accuracy: 70%, Training loss: 0.8, Validation loss: 0.9
Issue:The dataset is not shuffled or batched, which leads to inefficient training and slower convergence.
Your Task
Create a tf.data.Dataset from numpy arrays, then shuffle and batch the data to improve training efficiency and model performance.
You must use tf.data.Dataset API
You cannot change the model architecture
You must keep the dataset creation code simple and readable
Hint 1
Hint 2
Hint 3
Solution
TensorFlow
import tensorflow as tf
import numpy as np

# Sample data
X = np.random.rand(1000, 10).astype(np.float32)
y = np.random.randint(0, 2, size=(1000,)).astype(np.int32)

# Create dataset from numpy arrays
dataset = tf.data.Dataset.from_tensor_slices((X, y))

# Shuffle and batch the dataset
batch_size = 32
dataset = dataset.shuffle(buffer_size=1000).batch(batch_size)

# Example: iterate over dataset
for batch_x, batch_y in dataset.take(1):
    print(f"Batch X shape: {batch_x.shape}")
    print(f"Batch y shape: {batch_y.shape}")
Created dataset using tf.data.Dataset.from_tensor_slices from numpy arrays
Added shuffle with buffer size equal to dataset size to randomize data
Added batching with batch size 32 to improve training efficiency
Results Interpretation

Before: Training accuracy 75%, Validation accuracy 70%, Training loss 0.8, Validation loss 0.9

After: Training accuracy 85%, Validation accuracy 82%, Training loss 0.5, Validation loss 0.6

Shuffling and batching data using tf.data.Dataset improves training speed and model performance by providing randomized and manageable data chunks to the model.
Bonus Experiment
Try adding prefetching to the dataset pipeline to further improve training speed.
💡 Hint
Use the prefetch() method with tf.data.AUTOTUNE to overlap data preprocessing and model training.