ML model training in Snowflake - Time & Space Complexity
When training a machine learning model in Snowflake, it is important to understand how the time to complete training changes as the data size grows.
We want to know how the number of operations or work done increases when we use more data for training.
Analyze the time complexity of the following operation sequence.
CREATE OR REPLACE MODEL my_model
OPTIONS(
model_type = 'linear_regression',
input_label_cols = ('target')
) AS
SELECT * FROM training_data;
This sequence creates and trains a linear regression model using all rows from the training_data table.
Identify the API calls, resource provisioning, data transfers that repeat.
- Primary operation: Reading each row of training_data to compute model parameters.
- How many times: Once per row in the training_data table.
As the number of rows in training_data increases, the work to process each row and update model calculations grows proportionally.
| Input Size (n) | Approx. Api Calls/Operations |
|---|---|
| 10 | About 10 row reads and calculations |
| 100 | About 100 row reads and calculations |
| 1000 | About 1000 row reads and calculations |
Pattern observation: The work grows directly with the number of rows; doubling rows roughly doubles the work.
Time Complexity: O(n)
This means the time to train the model grows in a straight line with the number of data rows.
[X] Wrong: "Training time stays the same no matter how much data we use."
[OK] Correct: More data means more rows to read and process, so training takes longer as data grows.
Understanding how training time grows with data size helps you explain performance and scalability clearly in real-world cloud projects.
"What if we added feature selection before training? How would the time complexity change?"