Log management and troubleshooting in Hadoop - Time & Space Complexity
When managing logs in Hadoop, it is important to understand how the time to process logs grows as the amount of log data increases.
We want to know how the time needed to search or analyze logs changes when there are more log entries.
Analyze the time complexity of the following code snippet.
// Pseudocode for scanning Hadoop logs
for each logFile in logDirectory:
for each line in logFile:
if line contains errorKeyword:
add line to errorList
// After scanning all logs, print all error lines
for each errorLine in errorList:
print errorLine
This code scans all log files line by line to find error messages and then prints them.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Reading each line in every log file.
- How many times: Once for each line in all log files combined.
As the number of log lines grows, the time to scan all lines grows too.
| Input Size (n lines) | Approx. Operations |
|---|---|
| 10 | About 10 line checks |
| 100 | About 100 line checks |
| 1000 | About 1000 line checks |
Pattern observation: The time grows roughly in direct proportion to the number of log lines.
Time Complexity: O(n)
This means the time to find errors grows linearly with the number of log lines.
[X] Wrong: "Searching logs is always fast because we only look for a few errors."
[OK] Correct: Even if errors are few, the code still checks every line, so time depends on total lines, not error count.
Understanding how log scanning time grows helps you explain how to handle large data in real systems calmly and clearly.
"What if we indexed the logs by error type first? How would the time complexity change?"