Handling large files efficiently in Python - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When working with large files, it is important to understand how the time to process them grows as the file size increases.
We want to know how the program's running time changes when the file gets bigger.
Analyze the time complexity of the following code snippet.
with open('large_file.txt', 'r') as file:
for line in file:
process(line) # some operation on each line
This code reads a large file line by line and processes each line one at a time.
- Primary operation: Looping through each line of the file.
- How many times: Once for every line in the file.
As the number of lines in the file grows, the number of times we process lines grows at the same rate.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 |
| 100 | 100 |
| 1000 | 1000 |
Pattern observation: The work grows directly with the number of lines; doubling lines doubles work.
Time Complexity: O(n)
This means the time to finish grows in a straight line with the file size.
[X] Wrong: "Reading the whole file at once is always faster than line by line."
[OK] Correct: Reading all at once can use too much memory and slow down the program, especially for very large files.
Understanding how to handle large files efficiently shows you can write programs that work well even with big data, a useful skill in many real projects.
"What if we read the file in chunks of 100 lines instead of one line at a time? How would the time complexity change?"
Practice
Which method is best to read a very large text file without using too much memory?
with open('file.txt', 'r') as f:
Solution
Step 1: Understand memory usage when reading files
Reading the entire file at once loads all content into memory, which is bad for large files.Step 2: Use line-by-line reading to save memory
Usingfor line in f:reads one line at a time, keeping memory low.Final Answer:
Read the file line by line using a loop likefor line in f:-> Option CQuick Check:
Line-by-line reading = low memory use [OK]
- Using f.read() loads whole file into memory
- Using f.readlines() loads all lines at once
- Converting file to list loads entire file
Which of the following is the correct syntax to open a file for writing and ensure it closes automatically?
Solution
Step 1: Identify syntax for safe file handling
Thewithstatement opens the file and ensures it closes automatically after the block.Step 2: Check mode and variable assignment
Usingwith open('file.txt', 'w') as f:opens for writing and assigns tof.Final Answer:
with open('file.txt', 'w') as f: -> Option BQuick Check:
Use with open() for safe file handling [OK]
- Forgetting to close file after open()
- Using wrong mode like 'r' for writing
- Not assigning file object to a variable
What will be the output of this code snippet when reading a large file in chunks?
with open('largefile.txt', 'r') as f:
chunk = f.read(5)
print(chunk)
chunk = f.read(5)
print(chunk)Solution
Step 1: Understand read(size) behavior
Callingf.read(5)reads 5 characters from the current file position.Step 2: Reading twice moves file pointer forward
First read gets chars 1-5, second read gets chars 6-10.Final Answer:
Prints first 5 characters, then next 5 characters of the file -> Option AQuick Check:
read(5) reads 5 chars sequentially [OK]
- Thinking read() reads whole file always
- Assuming read(5) resets file pointer
- Believing read() without args is invalid
Find the error in this code that tries to write lines to a file efficiently:
lines = ['line1\n', 'line2\n', 'line3\n']
file = open('output.txt', 'w')
for line in lines:
file.write(line)
file.close()Solution
Step 1: Check file handling safety
Opening file withoutwithrisks leaving it open if error occurs beforeclose().Step 2: Use
Replacing withwith open()for automatic closingwith open('output.txt', 'w') as file:ensures file closes safely.Final Answer:
Using with open() is better to ensure file closes -> Option AQuick Check:
Use with open() to auto-close files [OK]
- Forgetting to close file on exceptions
- Opening file in wrong mode
- Misunderstanding readlines() vs list variable
You need to process a huge log file and write only lines containing the word 'ERROR' to a new file. Which approach is best to handle this efficiently?
Solution
Step 1: Avoid loading entire file into memory
Reading whole file at once uses too much memory for huge files.Step 2: Process line by line and write incrementally
Reading each line and writing matching lines immediately saves memory and is efficient.Final Answer:
Read file line by line, write matching lines immediately to output file -> Option DQuick Check:
Line-by-line processing + incremental write = efficient [OK]
- Loading entire file into memory
- Using wrong file mode for output
- Appending to output file opened in read mode
