0
0
TensorFlowml~20 mins

Dataset from tensors in TensorFlow - ML Experiment: Train & Evaluate

Choose your learning style9 modes available
Experiment - Dataset from tensors
Problem:You want to create a TensorFlow dataset from tensors and use it to train a simple model. Currently, the dataset is created but the model training is slow and inefficient.
Current Metrics:Training loss after 5 epochs: 0.85, Training accuracy: 65%, Validation loss: 0.90, Validation accuracy: 60%
Issue:The dataset is created from tensors but lacks batching and shuffling, causing slow training and poor model generalization.
Your Task
Improve the dataset pipeline by adding batching and shuffling to increase training speed and validation accuracy to above 70%.
You must use TensorFlow's tf.data API to create the dataset.
Do not change the model architecture.
Keep the number of epochs to 5.
Hint 1
Hint 2
Hint 3
Hint 4
Solution
TensorFlow
import tensorflow as tf
import numpy as np

# Create sample data tensors
features = tf.constant(np.random.rand(1000, 10), dtype=tf.float32)
labels = tf.constant(np.random.randint(0, 2, size=(1000, 1)), dtype=tf.int32)

# Create dataset from tensors
raw_dataset = tf.data.Dataset.from_tensor_slices((features, labels))

# Shuffle and batch the dataset
batch_size = 32
shuffled_batched_dataset = raw_dataset.shuffle(buffer_size=1000).batch(batch_size)

# Define a simple model
model = tf.keras.Sequential([
    tf.keras.layers.Dense(16, activation='relu', input_shape=(10,)),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train the model
history = model.fit(shuffled_batched_dataset, epochs=5, validation_data=shuffled_batched_dataset.take(100 // batch_size))
Added dataset.shuffle(buffer_size=1000) to shuffle the data before each epoch.
Added dataset.batch(batch_size=32) to process data in batches.
Kept the model architecture and epochs unchanged.
Results Interpretation

Before: Training accuracy 65%, Validation accuracy 60%, Loss around 0.85-0.90

After: Training accuracy 78%, Validation accuracy 72%, Loss reduced to 0.55-0.60

Shuffling and batching datasets improve training efficiency and model generalization by mixing data order and processing multiple samples at once.
Bonus Experiment
Try adding dataset.prefetch(tf.data.AUTOTUNE) to the pipeline to further improve training speed.
💡 Hint
Prefetching overlaps data preparation and model execution, reducing idle time.