Scaling and normalization concepts in Data Analysis Python - Time & Space Complexity
When we scale or normalize data, we change its values to a common range or scale.
We want to know how the time to do this changes as the data size grows.
Analyze the time complexity of the following code snippet.
import numpy as np
def min_max_scale(data):
min_val = np.min(data)
max_val = np.max(data)
scaled = (data - min_val) / (max_val - min_val)
return scaled
sample_data = np.array([10, 20, 30, 40, 50])
scaled_data = min_max_scale(sample_data)
This code rescales a list of numbers to a range between 0 and 1 using min-max scaling.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning the data array to find minimum and maximum values.
- How many times: Each element is visited twice: once for min, once for max, then once more for scaling.
As the data size grows, the number of operations grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 30 (3 passes over 10 elements) |
| 100 | About 300 (3 passes over 100 elements) |
| 1000 | About 3000 (3 passes over 1000 elements) |
Pattern observation: The operations increase linearly as the input size increases.
Time Complexity: O(n)
This means the time to scale data grows directly with the number of data points.
[X] Wrong: "Scaling data takes constant time no matter how big the data is."
[OK] Correct: Every data point must be processed, so time grows as data grows.
Understanding how data scaling time grows helps you explain efficiency when preparing data for models.
"What if we used a scaling method that requires sorting the data? How would the time complexity change?"