0
0
TensorFlowml~5 mins

Dataset from files in TensorFlow

Choose your learning style9 modes available
Introduction

We use datasets from files to easily load and work with data stored on your computer. This helps us train machine learning models with real data.

You have images saved in folders and want to train a model to recognize them.
You want to read text data from files to analyze or build a language model.
You have CSV files with tabular data for prediction tasks.
You want to load large datasets without loading everything into memory at once.
You want to preprocess data as you load it for efficient training.
Syntax
TensorFlow
tf.data.Dataset.from_tensor_slices(filenames)

# or for images
image_dataset = tf.keras.utils.image_dataset_from_directory(directory_path, batch_size=32, image_size=(256, 256))

from_tensor_slices creates a dataset from a list of file paths.

image_dataset_from_directory loads images from folders and labels them automatically.

Examples
This creates a dataset from a list of text file names and prints each file name.
TensorFlow
import tensorflow as tf

filenames = ["file1.txt", "file2.txt"]
dataset = tf.data.Dataset.from_tensor_slices(filenames)
for file in dataset:
    print(file.numpy())
This loads images from a folder called 'images', resizes them to 128x128, and prints the shape of one batch and its labels.
TensorFlow
import tensorflow as tf

image_dataset = tf.keras.utils.image_dataset_from_directory(
    "./images",
    batch_size=16,
    image_size=(128, 128)
)

for images, labels in image_dataset.take(1):
    print(images.shape, labels.numpy())
Sample Model

This program creates two text files, loads their paths into a TensorFlow dataset, reads the file contents, and prints them.

TensorFlow
import tensorflow as tf
import os

# Create example text files
os.makedirs('data', exist_ok=True)
with open('data/file1.txt', 'w') as f:
    f.write('Hello TensorFlow')
with open('data/file2.txt', 'w') as f:
    f.write('Dataset from files')

# List of file paths
filenames = ["data/file1.txt", "data/file2.txt"]

# Create dataset from file names
file_dataset = tf.data.Dataset.from_tensor_slices(filenames)

# Function to read file content
@tf.function
def read_file(filename):
    text = tf.io.read_file(filename)
    return text

# Map function to dataset
text_dataset = file_dataset.map(read_file)

# Print contents
for text in text_dataset:
    print(text.numpy().decode('utf-8'))
OutputSuccess
Important Notes

Use tf.io.read_file to read file contents inside the dataset pipeline.

Mapping functions lets you preprocess data as it loads.

Datasets help handle large data efficiently without loading all at once.

Summary

Datasets from files let you load data stored on disk easily.

You can create datasets from file paths or directly from image folders.

Use mapping to read and preprocess file contents during loading.