Log transformation for skewed data in Data Analysis Python - Time & Space Complexity
We want to understand how the time needed to apply a log transformation changes as the data size grows.
How does the work increase when we have more data points to transform?
Analyze the time complexity of the following code snippet.
import numpy as np
import pandas as pd
def log_transform(data):
return np.log(data + 1)
# Example usage
values = pd.Series([10, 100, 1000, 10000])
transformed = log_transform(values)
This code applies a log transformation to each value in a data series to reduce skewness.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Applying the log function to each element in the data.
- How many times: Once for every data point in the input series.
As the number of data points increases, the total work grows in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 log calculations |
| 100 | 100 log calculations |
| 1000 | 1000 log calculations |
Pattern observation: Doubling the data size doubles the work needed.
Time Complexity: O(n)
This means the time to transform grows linearly with the number of data points.
[X] Wrong: "Log transformation takes constant time no matter how much data there is."
[OK] Correct: Each data point needs its own calculation, so more data means more work.
Understanding how data transformations scale helps you explain your data preparation steps clearly and shows you know how to handle larger datasets efficiently.
"What if we applied the log transformation only to a random sample of the data? How would the time complexity change?"