0
0
TensorFlowml~5 mins

tf.data.Dataset creation in TensorFlow

Choose your learning style9 modes available
Introduction

We use tf.data.Dataset to handle and prepare data easily for machine learning. It helps us load, transform, and feed data step-by-step.

When you have a list or array of data and want to process it in batches.
When you want to read data from files like images or text for training.
When you need to shuffle or repeat data during training.
When you want to apply transformations like mapping functions to your data.
When you want to build efficient input pipelines for TensorFlow models.
Syntax
TensorFlow
tf.data.Dataset.from_tensor_slices(data)
tf.data.Dataset.from_generator(generator_function, output_types=output_types)
tf.data.Dataset.from_tensors(tensor)

from_tensor_slices splits data into elements (like rows).

from_generator creates dataset from a Python generator for dynamic data.

Examples
This creates a dataset from a simple list and prints each item.
TensorFlow
import tensorflow as tf

# Create dataset from a list
data = [1, 2, 3, 4]
dataset = tf.data.Dataset.from_tensor_slices(data)
for item in dataset:
    print(item.numpy())
This creates a dataset with one element (the whole tensor).
TensorFlow
import tensorflow as tf

# Create dataset from a single tensor
tensor = tf.constant([[1, 2], [3, 4]])
dataset = tf.data.Dataset.from_tensors(tensor)
for item in dataset:
    print(item.numpy())
This creates a dataset from a generator function that yields values.
TensorFlow
import tensorflow as tf

def gen():
    for i in range(3):
        yield i * 2

dataset = tf.data.Dataset.from_generator(gen, output_types=tf.int32)
for item in dataset:
    print(item.numpy())
Sample Model

This program creates a dataset from a list of numbers and prints each number. It shows how to start using tf.data.Dataset with simple data.

TensorFlow
import tensorflow as tf

# Sample data: list of numbers
numbers = [10, 20, 30, 40, 50]

# Create dataset from the list
dataset = tf.data.Dataset.from_tensor_slices(numbers)

# Print each element
print("Dataset elements:")
for element in dataset:
    print(element.numpy())
OutputSuccess
Important Notes

Datasets created with from_tensor_slices split the input data into individual elements.

Use from_generator when data is too large to fit in memory or needs to be generated on the fly.

Datasets can be chained with methods like batch(), shuffle(), and map() for more complex pipelines.

Summary

tf.data.Dataset helps manage data for TensorFlow models easily.

You can create datasets from lists, tensors, or generators.

Datasets let you process data step-by-step for training or evaluation.