Bird
Raised Fist0
Snowflakecloud~10 mins

Loading from S3, Azure Blob, GCS in Snowflake - Step-by-Step Execution

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Process Flow - Loading from S3, Azure Blob, GCS
Start: Define Storage Integration
Create External Stage
Run COPY INTO Command
Data Loaded into Snowflake Table
End
This flow shows how Snowflake loads data from cloud storage by defining access, creating a stage, and copying data into a table.
Execution Sample
Snowflake
CREATE OR REPLACE STAGE my_s3_stage
  URL='s3://mybucket/data/'
  STORAGE_INTEGRATION = my_s3_integration;

COPY INTO my_table FROM @my_s3_stage FILE_FORMAT = (TYPE = 'CSV');
This code sets up access to an S3 bucket and loads CSV files into a Snowflake table.
Process Table
StepActionEvaluationResult
1Define storage integration for S3Integration created with credentialsAccess to S3 bucket authorized
2Create external stage pointing to S3 URLStage created with URL and integrationStage 'my_s3_stage' ready to use
3Run COPY INTO command from stageFiles found in S3 pathData copied into 'my_table'
4Verify data loadQuery table row countRows match files loaded
5EndNo more files or commandsLoading complete
💡 All files from the cloud storage stage are loaded into the Snowflake table successfully.
Status Tracker
VariableStartAfter Step 1After Step 2After Step 3Final
storage_integrationundefinedcreated with S3 credentialscreatedcreatedcreated
external_stageundefinedundefinedcreated with S3 URLcreatedcreated
copy_statusnot startednot startednot startedrunningcompleted
table_data_rows000increasingfinal row count
Key Moments - 3 Insights
Why do we need a storage integration before creating the stage?
The storage integration securely stores credentials and permissions to access the cloud storage. Without it, Snowflake cannot read data from S3, Azure Blob, or GCS. See execution_table step 1.
What happens if the COPY INTO command runs but no files are found in the stage?
The COPY INTO command completes but loads zero rows because no files exist at the stage path. This is shown in execution_table step 3 where files are checked.
How do we confirm data loaded correctly into the Snowflake table?
By querying the table row count after the COPY INTO command finishes, as shown in execution_table step 4.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the result after step 2?
AStage 'my_s3_stage' ready to use
BData copied into 'my_table'
CIntegration created with credentials
DRows match files loaded
💡 Hint
Check the 'Result' column for step 2 in execution_table.
At which step does the data start copying into the Snowflake table?
AStep 1
BStep 3
CStep 2
DStep 4
💡 Hint
Look for the 'Action' describing the COPY INTO command in execution_table.
If the storage integration is missing, what will happen at step 2?
ATable will be created automatically
BData will load successfully anyway
CStage creation will fail due to missing credentials
DCOPY INTO command will skip files
💡 Hint
Refer to key_moments about why storage integration is needed before stage creation.
Concept Snapshot
Loading from S3, Azure Blob, GCS in Snowflake:
1. Create a storage integration with cloud credentials.
2. Create an external stage pointing to the cloud storage URL using the integration.
3. Use COPY INTO command from the stage to load data into a Snowflake table.
4. Verify data loaded by querying the table.
This process securely and efficiently loads cloud data into Snowflake.
Full Transcript
This visual execution shows how Snowflake loads data from cloud storage services like S3, Azure Blob, and GCS. First, a storage integration is created to securely store access credentials. Next, an external stage is created that points to the cloud storage location using this integration. Then, the COPY INTO command runs to load files from the stage into a Snowflake table. Finally, the data load is verified by checking the table row count. Each step is tracked to show how variables like storage integration, stage, copy status, and table data rows change. Common confusions such as the need for storage integration and what happens if no files are found are addressed. Quiz questions test understanding of the execution steps and outcomes.

Practice

(1/5)
1. What is the main purpose of using COPY INTO in Snowflake when loading data from S3, Azure Blob, or GCS?
easy
A. To load data files from cloud storage into Snowflake tables
B. To export data from Snowflake to cloud storage
C. To create a new cloud storage bucket
D. To delete files from cloud storage

Solution

  1. Step 1: Understand the role of COPY INTO

    The COPY INTO command is used in Snowflake to load data from external cloud storage into Snowflake tables.
  2. Step 2: Differentiate from other operations

    Exporting data, creating buckets, or deleting files are not done by COPY INTO. It specifically loads data into tables.
  3. Final Answer:

    To load data files from cloud storage into Snowflake tables -> Option A
  4. Quick Check:

    Loading data = COPY INTO [OK]
Hint: COPY INTO loads data from cloud storage to tables [OK]
Common Mistakes:
  • Confusing COPY INTO with export commands
  • Thinking COPY INTO manages cloud storage buckets
  • Assuming COPY INTO deletes files
2. Which of the following is the correct syntax to load data from an S3 bucket into a Snowflake table named my_table?
easy
A. COPY INTO my_table FROM @my_s3_stage FILE_FORMAT = (TYPE = 'CSV');
B. LOAD DATA INTO my_table FROM 's3://mybucket/data.csv';
C. INSERT INTO my_table SELECT * FROM s3://mybucket/data.csv;
D. IMPORT INTO my_table FROM @my_s3_stage FORMAT = CSV;

Solution

  1. Step 1: Identify correct Snowflake COPY INTO syntax

    Snowflake uses COPY INTO table_name FROM @stage FILE_FORMAT = (TYPE = 'format') to load data.
  2. Step 2: Eliminate incorrect options

    LOAD DATA INTO is not valid Snowflake syntax. Direct INSERT INTO ... FROM s3:// paths are not supported. IMPORT INTO does not exist. The correct syntax is COPY INTO my_table FROM @my_s3_stage FILE_FORMAT = (TYPE = 'CSV');.
  3. Final Answer:

    COPY INTO my_table FROM @my_s3_stage FILE_FORMAT = (TYPE = 'CSV'); -> Option A
  4. Quick Check:

    COPY INTO + stage + file format = correct syntax [OK]
Hint: COPY INTO + @stage + FILE_FORMAT is the right pattern [OK]
Common Mistakes:
  • Using LOAD DATA instead of COPY INTO
  • Trying to SELECT directly from S3 path
  • Using IMPORT INTO which is invalid
3. Given the following Snowflake command:
COPY INTO sales FROM @azure_blob_stage FILE_FORMAT = (TYPE = 'JSON') ON_ERROR = 'CONTINUE';

What happens if one file in the Azure Blob storage has invalid JSON data?
medium
A. The entire load fails and no data is loaded
B. Snowflake automatically fixes the invalid JSON and loads all data
C. Only the invalid file is skipped, and loading continues for others
D. The command ignores the error and loads all files including invalid data

Solution

  1. Step 1: Understand ON_ERROR = 'CONTINUE'

    This option tells Snowflake to skip files or rows with errors and continue loading the rest.
  2. Step 2: Apply to invalid JSON file

    The invalid JSON file will be skipped, but other valid files will load successfully.
  3. Final Answer:

    Only the invalid file is skipped, and loading continues for others -> Option C
  4. Quick Check:

    ON_ERROR = CONTINUE skips errors, loads rest [OK]
Hint: ON_ERROR = CONTINUE skips bad files, loads others [OK]
Common Mistakes:
  • Assuming entire load fails on one bad file
  • Thinking Snowflake auto-fixes JSON errors
  • Believing invalid data is loaded anyway
4. You run this command to load data from Google Cloud Storage:
COPY INTO customers FROM @gcs_stage FILE_FORMAT = (TYPE = 'CSV');

But you get an error saying 'Storage integration not authorized'. What is the most likely cause?
medium
A. The GCS bucket is empty
B. The CSV file format is incorrect
C. The Snowflake table does not exist
D. The storage integration lacks permission to access the GCS bucket

Solution

  1. Step 1: Analyze the error message

    'Storage integration not authorized' means Snowflake cannot access the cloud storage due to permission issues.
  2. Step 2: Identify cause

    The storage integration must have proper permissions to read from the GCS bucket. Other options do not cause authorization errors.
  3. Final Answer:

    The storage integration lacks permission to access the GCS bucket -> Option D
  4. Quick Check:

    Authorization error = permission issue [OK]
Hint: Authorization errors usually mean permission problems [OK]
Common Mistakes:
  • Blaming file format for authorization errors
  • Assuming table existence causes storage errors
  • Ignoring permission setup for storage integration
5. You want to load multiple CSV files from an S3 bucket into Snowflake, but only files with the prefix 2024/. Which COPY INTO command correctly filters these files?
hard
A. COPY INTO sales FROM @s3_stage FILE_FORMAT = (TYPE = 'CSV') WHERE filename LIKE '2024/%';
B. COPY INTO sales FROM @s3_stage FILE_FORMAT = (TYPE = 'CSV') PATTERN = '^2024/.*';
C. COPY INTO sales FROM @s3_stage FILE_FORMAT = (TYPE = 'CSV') FILES = ('2024/');
D. COPY INTO sales FROM @s3_stage FILE_FORMAT = (TYPE = 'CSV') PATTERN = '2024/.*';

Solution

  1. Step 1: Understand file filtering in COPY INTO

    Snowflake uses the PATTERN parameter with a regular expression to filter files by name or prefix.
  2. Step 2: Check regex correctness

    PATTERN = '^2024/.*' matches files starting exactly with '2024/'. PATTERN = '2024/.*' without the ^ may match files where '2024/' appears elsewhere in the path. The other options use invalid parameters like WHERE or FILES.
  3. Final Answer:

    COPY INTO sales FROM @s3_stage FILE_FORMAT = (TYPE = 'CSV') PATTERN = '^2024/.*'; -> Option B
  4. Quick Check:

    PATTERN with ^ prefix filters files correctly [OK]
Hint: Use PATTERN with ^ prefix to filter file names [OK]
Common Mistakes:
  • Omitting ^ in regex causing wrong files to load
  • Using WHERE or FILES incorrectly for filtering
  • Confusing file prefix with file list