COPY INTO command in Snowflake - Time & Space Complexity
When loading data into Snowflake using the COPY INTO command, it is important to understand how the time taken grows as the data size increases.
We want to know how the number of operations changes when we load more data.
Analyze the time complexity of the following operation sequence.
COPY INTO my_table
FROM @my_stage/data_files
FILE_FORMAT = (TYPE = 'CSV' FIELD_DELIMITER = ',' SKIP_HEADER = 1)
ON_ERROR = 'CONTINUE';
This command loads multiple CSV files from a stage into a table, skipping the header row and continuing on errors.
Identify the API calls, resource provisioning, data transfers that repeat.
- Primary operation: Reading and parsing each file from the stage and inserting data into the table.
- How many times: Once per file, repeated for all files in the stage folder.
As the number of files or total data size grows, the number of read and insert operations grows roughly in proportion.
| Input Size (n) | Approx. API Calls/Operations |
|---|---|
| 10 files | 10 read and insert operations |
| 100 files | 100 read and insert operations |
| 1000 files | 1000 read and insert operations |
Pattern observation: The operations increase linearly with the number of files or data size.
Time Complexity: O(n)
This means the time to complete the COPY INTO command grows directly in proportion to the amount of data being loaded.
[X] Wrong: "COPY INTO runs in constant time no matter how much data is loaded."
[OK] Correct: The command must read and process each file, so more data means more work and longer time.
Understanding how data loading time grows helps you design efficient pipelines and explain performance in real projects.
"What if we changed the COPY INTO command to load compressed files instead? How would the time complexity change?"