0
0
Data Analysis Pythondata~5 mins

Chunked reading for large files in Data Analysis Python

Choose your learning style9 modes available
Introduction

Sometimes files are too big to open all at once. Chunked reading helps us read big files bit by bit without using too much memory.

When you have a huge CSV file that can't fit into your computer's memory.
When you want to process data step-by-step instead of all at once.
When you want to save memory while analyzing large datasets.
When you want to read and analyze data in smaller parts for faster results.
Syntax
Data Analysis Python
import pandas as pd

for chunk in pd.read_csv('filename.csv', chunksize=1000):
    # process each chunk here
    print(chunk.head())

chunksize sets how many rows to read at a time.

Each chunk is a smaller DataFrame you can work with.

Examples
This reads the file in parts of 500 rows and prints the size of each part.
Data Analysis Python
import pandas as pd

for chunk in pd.read_csv('data.csv', chunksize=500):
    print(chunk.shape)
This counts the total number of rows by adding up rows from each chunk.
Data Analysis Python
import pandas as pd

chunks = pd.read_csv('data.csv', chunksize=1000)
total_rows = 0
for chunk in chunks:
    total_rows += len(chunk)
print(f'Total rows: {total_rows}')
Sample Program

This program creates a CSV file with 2500 numbers, reads it in chunks of 1000 rows, and sums all numbers step-by-step.

Data Analysis Python
import pandas as pd

# Create a sample CSV file with 2500 rows
sample_data = {'number': range(2500)}
df = pd.DataFrame(sample_data)
df.to_csv('sample_large.csv', index=False)

# Read the CSV file in chunks of 1000 rows
chunks = pd.read_csv('sample_large.csv', chunksize=1000)
total_sum = 0
for chunk in chunks:
    total_sum += chunk['number'].sum()

print(f'Sum of all numbers: {total_sum}')
OutputSuccess
Important Notes

Chunk size depends on your computer memory and file size.

Processing chunks separately helps avoid crashes from memory overload.

You can combine results from chunks to get final answers.

Summary

Chunked reading lets you handle big files in small parts.

Use chunksize to control how many rows to read at once.

Process each chunk and combine results for full analysis.