Bird
Raised Fist0
Snowflakecloud~5 mins

Loading from S3, Azure Blob, GCS in Snowflake - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is the purpose of an external stage in Snowflake when loading data from cloud storage?
An external stage in Snowflake points to a location in cloud storage (like S3, Azure Blob, or GCS) and allows Snowflake to access files there for loading or unloading data.
Click to reveal answer
beginner
Which cloud storage services are commonly supported by Snowflake for loading data?
Snowflake supports Amazon S3, Microsoft Azure Blob Storage, and Google Cloud Storage (GCS) for loading data.
Click to reveal answer
intermediate
What is the role of credentials when creating an external stage in Snowflake?
Credentials authenticate Snowflake to access the cloud storage location securely, such as AWS IAM roles for S3, Azure SAS tokens for Blob, or GCP service account keys for GCS.
Click to reveal answer
beginner
How does Snowflake load data from an external stage into a table?
Snowflake uses the COPY INTO command to load data from files in the external stage into a target table, parsing the file format and applying any transformations if needed.
Click to reveal answer
beginner
Why is it important to define the file format when loading data from cloud storage into Snowflake?
Defining the file format tells Snowflake how to interpret the data files (like CSV, JSON, Parquet), ensuring correct parsing and loading of data into tables.
Click to reveal answer
Which command in Snowflake is used to load data from an external stage into a table?
ACOPY INTO
BLOAD DATA
CINSERT FROM
DIMPORT FILE
What type of credentials might you use to access an Amazon S3 bucket from Snowflake?
AAzure SAS Token
BGCP Service Account Key
CAWS IAM Role
DFTP Username
Which cloud storage service is NOT supported by Snowflake for external stages?
AMicrosoft Azure Blob Storage
BDropbox
CGoogle Cloud Storage
DAmazon S3
Why do you need to specify a file format when loading data from cloud storage into Snowflake?
ATo create a backup of the files
BTo encrypt the data during transfer
CTo compress the files
DTo tell Snowflake how to parse the data files
What is an external stage in Snowflake?
AA pointer to cloud storage location for data files
BA temporary table inside Snowflake
CA user role for access control
DA type of file format
Explain how Snowflake loads data from Amazon S3 into a table.
Think about the steps from storage to table.
You got /4 concepts.
    Describe the differences in credentials used for accessing Azure Blob Storage and Google Cloud Storage from Snowflake.
    Focus on the type of keys or tokens used.
    You got /3 concepts.

      Practice

      (1/5)
      1. What is the main purpose of using COPY INTO in Snowflake when loading data from S3, Azure Blob, or GCS?
      easy
      A. To load data files from cloud storage into Snowflake tables
      B. To export data from Snowflake to cloud storage
      C. To create a new cloud storage bucket
      D. To delete files from cloud storage

      Solution

      1. Step 1: Understand the role of COPY INTO

        The COPY INTO command is used in Snowflake to load data from external cloud storage into Snowflake tables.
      2. Step 2: Differentiate from other operations

        Exporting data, creating buckets, or deleting files are not done by COPY INTO. It specifically loads data into tables.
      3. Final Answer:

        To load data files from cloud storage into Snowflake tables -> Option A
      4. Quick Check:

        Loading data = COPY INTO [OK]
      Hint: COPY INTO loads data from cloud storage to tables [OK]
      Common Mistakes:
      • Confusing COPY INTO with export commands
      • Thinking COPY INTO manages cloud storage buckets
      • Assuming COPY INTO deletes files
      2. Which of the following is the correct syntax to load data from an S3 bucket into a Snowflake table named my_table?
      easy
      A. COPY INTO my_table FROM @my_s3_stage FILE_FORMAT = (TYPE = 'CSV');
      B. LOAD DATA INTO my_table FROM 's3://mybucket/data.csv';
      C. INSERT INTO my_table SELECT * FROM s3://mybucket/data.csv;
      D. IMPORT INTO my_table FROM @my_s3_stage FORMAT = CSV;

      Solution

      1. Step 1: Identify correct Snowflake COPY INTO syntax

        Snowflake uses COPY INTO table_name FROM @stage FILE_FORMAT = (TYPE = 'format') to load data.
      2. Step 2: Eliminate incorrect options

        LOAD DATA INTO is not valid Snowflake syntax. Direct INSERT INTO ... FROM s3:// paths are not supported. IMPORT INTO does not exist. The correct syntax is COPY INTO my_table FROM @my_s3_stage FILE_FORMAT = (TYPE = 'CSV');.
      3. Final Answer:

        COPY INTO my_table FROM @my_s3_stage FILE_FORMAT = (TYPE = 'CSV'); -> Option A
      4. Quick Check:

        COPY INTO + stage + file format = correct syntax [OK]
      Hint: COPY INTO + @stage + FILE_FORMAT is the right pattern [OK]
      Common Mistakes:
      • Using LOAD DATA instead of COPY INTO
      • Trying to SELECT directly from S3 path
      • Using IMPORT INTO which is invalid
      3. Given the following Snowflake command:
      COPY INTO sales FROM @azure_blob_stage FILE_FORMAT = (TYPE = 'JSON') ON_ERROR = 'CONTINUE';

      What happens if one file in the Azure Blob storage has invalid JSON data?
      medium
      A. The entire load fails and no data is loaded
      B. Snowflake automatically fixes the invalid JSON and loads all data
      C. Only the invalid file is skipped, and loading continues for others
      D. The command ignores the error and loads all files including invalid data

      Solution

      1. Step 1: Understand ON_ERROR = 'CONTINUE'

        This option tells Snowflake to skip files or rows with errors and continue loading the rest.
      2. Step 2: Apply to invalid JSON file

        The invalid JSON file will be skipped, but other valid files will load successfully.
      3. Final Answer:

        Only the invalid file is skipped, and loading continues for others -> Option C
      4. Quick Check:

        ON_ERROR = CONTINUE skips errors, loads rest [OK]
      Hint: ON_ERROR = CONTINUE skips bad files, loads others [OK]
      Common Mistakes:
      • Assuming entire load fails on one bad file
      • Thinking Snowflake auto-fixes JSON errors
      • Believing invalid data is loaded anyway
      4. You run this command to load data from Google Cloud Storage:
      COPY INTO customers FROM @gcs_stage FILE_FORMAT = (TYPE = 'CSV');

      But you get an error saying 'Storage integration not authorized'. What is the most likely cause?
      medium
      A. The GCS bucket is empty
      B. The CSV file format is incorrect
      C. The Snowflake table does not exist
      D. The storage integration lacks permission to access the GCS bucket

      Solution

      1. Step 1: Analyze the error message

        'Storage integration not authorized' means Snowflake cannot access the cloud storage due to permission issues.
      2. Step 2: Identify cause

        The storage integration must have proper permissions to read from the GCS bucket. Other options do not cause authorization errors.
      3. Final Answer:

        The storage integration lacks permission to access the GCS bucket -> Option D
      4. Quick Check:

        Authorization error = permission issue [OK]
      Hint: Authorization errors usually mean permission problems [OK]
      Common Mistakes:
      • Blaming file format for authorization errors
      • Assuming table existence causes storage errors
      • Ignoring permission setup for storage integration
      5. You want to load multiple CSV files from an S3 bucket into Snowflake, but only files with the prefix 2024/. Which COPY INTO command correctly filters these files?
      hard
      A. COPY INTO sales FROM @s3_stage FILE_FORMAT = (TYPE = 'CSV') WHERE filename LIKE '2024/%';
      B. COPY INTO sales FROM @s3_stage FILE_FORMAT = (TYPE = 'CSV') PATTERN = '^2024/.*';
      C. COPY INTO sales FROM @s3_stage FILE_FORMAT = (TYPE = 'CSV') FILES = ('2024/');
      D. COPY INTO sales FROM @s3_stage FILE_FORMAT = (TYPE = 'CSV') PATTERN = '2024/.*';

      Solution

      1. Step 1: Understand file filtering in COPY INTO

        Snowflake uses the PATTERN parameter with a regular expression to filter files by name or prefix.
      2. Step 2: Check regex correctness

        PATTERN = '^2024/.*' matches files starting exactly with '2024/'. PATTERN = '2024/.*' without the ^ may match files where '2024/' appears elsewhere in the path. The other options use invalid parameters like WHERE or FILES.
      3. Final Answer:

        COPY INTO sales FROM @s3_stage FILE_FORMAT = (TYPE = 'CSV') PATTERN = '^2024/.*'; -> Option B
      4. Quick Check:

        PATTERN with ^ prefix filters files correctly [OK]
      Hint: Use PATTERN with ^ prefix to filter file names [OK]
      Common Mistakes:
      • Omitting ^ in regex causing wrong files to load
      • Using WHERE or FILES incorrectly for filtering
      • Confusing file prefix with file list