Large files can be too big to load all at once in memory. Handling them efficiently means reading or writing them bit by bit to save memory and keep programs fast.
Handling large files efficiently in Python
Start learning this pattern below
Jump into concepts and practice - no test required
with open('filename.txt', 'r') as file: for line in file: # process each line here
The with statement safely opens and closes the file.
Reading line by line uses little memory, good for big files.
with open('bigfile.txt', 'r') as file: for line in file: print(line.strip())
with open('bigfile.txt', 'rb') as file: chunk = file.read(1024) while chunk: # process chunk chunk = file.read(1024)
with open('output.txt', 'w') as file: for i in range(1000000): file.write(f'Line {i}\n')
This program creates a small file with 5 lines, then counts the lines by reading the file line by line without loading it all at once.
def count_lines(filename): count = 0 with open(filename, 'r') as file: for line in file: count += 1 return count filename = 'sample.txt' with open(filename, 'w') as f: for i in range(5): f.write(f'Line number {i+1}\n') lines = count_lines(filename) print(f'The file {filename} has {lines} lines.')
Always use with to open files to avoid forgetting to close them.
Reading files line by line or in chunks helps keep memory use low.
For very large files, avoid reading the whole file into a list or string at once.
Use with open() to safely open and close files.
Read large files line by line or in small chunks to save memory.
Write to files incrementally instead of building big strings in memory.
Practice
Which method is best to read a very large text file without using too much memory?
with open('file.txt', 'r') as f:
Solution
Step 1: Understand memory usage when reading files
Reading the entire file at once loads all content into memory, which is bad for large files.Step 2: Use line-by-line reading to save memory
Usingfor line in f:reads one line at a time, keeping memory low.Final Answer:
Read the file line by line using a loop likefor line in f:-> Option CQuick Check:
Line-by-line reading = low memory use [OK]
- Using f.read() loads whole file into memory
- Using f.readlines() loads all lines at once
- Converting file to list loads entire file
Which of the following is the correct syntax to open a file for writing and ensure it closes automatically?
Solution
Step 1: Identify syntax for safe file handling
Thewithstatement opens the file and ensures it closes automatically after the block.Step 2: Check mode and variable assignment
Usingwith open('file.txt', 'w') as f:opens for writing and assigns tof.Final Answer:
with open('file.txt', 'w') as f: -> Option BQuick Check:
Use with open() for safe file handling [OK]
- Forgetting to close file after open()
- Using wrong mode like 'r' for writing
- Not assigning file object to a variable
What will be the output of this code snippet when reading a large file in chunks?
with open('largefile.txt', 'r') as f:
chunk = f.read(5)
print(chunk)
chunk = f.read(5)
print(chunk)Solution
Step 1: Understand read(size) behavior
Callingf.read(5)reads 5 characters from the current file position.Step 2: Reading twice moves file pointer forward
First read gets chars 1-5, second read gets chars 6-10.Final Answer:
Prints first 5 characters, then next 5 characters of the file -> Option AQuick Check:
read(5) reads 5 chars sequentially [OK]
- Thinking read() reads whole file always
- Assuming read(5) resets file pointer
- Believing read() without args is invalid
Find the error in this code that tries to write lines to a file efficiently:
lines = ['line1\n', 'line2\n', 'line3\n']
file = open('output.txt', 'w')
for line in lines:
file.write(line)
file.close()Solution
Step 1: Check file handling safety
Opening file withoutwithrisks leaving it open if error occurs beforeclose().Step 2: Use
Replacing withwith open()for automatic closingwith open('output.txt', 'w') as file:ensures file closes safely.Final Answer:
Using with open() is better to ensure file closes -> Option AQuick Check:
Use with open() to auto-close files [OK]
- Forgetting to close file on exceptions
- Opening file in wrong mode
- Misunderstanding readlines() vs list variable
You need to process a huge log file and write only lines containing the word 'ERROR' to a new file. Which approach is best to handle this efficiently?
Solution
Step 1: Avoid loading entire file into memory
Reading whole file at once uses too much memory for huge files.Step 2: Process line by line and write incrementally
Reading each line and writing matching lines immediately saves memory and is efficient.Final Answer:
Read file line by line, write matching lines immediately to output file -> Option DQuick Check:
Line-by-line processing + incremental write = efficient [OK]
- Loading entire file into memory
- Using wrong file mode for output
- Appending to output file opened in read mode
