0
0
Snowflakecloud~5 mins

Why Snowpark brings code to the data in Snowflake - Why It Works

Choose your learning style9 modes available
Introduction
Moving large amounts of data between storage and compute can be slow and costly. Snowpark solves this by letting you run your code directly where the data lives, inside Snowflake, so you avoid moving data around.
When you want to process large datasets without downloading them to your local machine.
When you need to run complex data transformations close to the data for faster results.
When you want to use familiar programming languages like Python or Java to work with your data inside Snowflake.
When you want to reduce network costs by minimizing data transfer.
When you want to build data pipelines that run efficiently inside your cloud data warehouse.
Commands
Connect to Snowflake using SnowSQL CLI with your account, user, warehouse, database, and schema to prepare for running Snowpark code.
Terminal
snowsql -a myaccount -u myuser -w mywarehouse -d mydatabase -s myschema
Expected OutputExpected
Welcome to Snowflake You are now connected to Snowflake as user 'myuser'.
-a - Specifies your Snowflake account name
-u - Specifies your Snowflake username
-w - Specifies the warehouse to use
-d - Specifies the database to use
-s - Specifies the schema to use
Create a sample table to hold sales data inside Snowflake where Snowpark will run code close to this data.
Terminal
CREATE OR REPLACE TABLE sales_data (id INT, amount FLOAT, region STRING);
Expected OutputExpected
Statement executed successfully.
Insert sample rows into the sales_data table to have data for processing.
Terminal
INSERT INTO sales_data VALUES (1, 100.0, 'North'), (2, 150.5, 'South'), (3, 200.0, 'East');
Expected OutputExpected
3 rows inserted.
Run a Python script using Snowpark that processes the sales_data table directly inside Snowflake without moving data out.
Terminal
python3 snowpark_example.py
Expected OutputExpected
Total sales amount: 450.5
Key Concept

If you remember nothing else from this pattern, remember: Snowpark runs your code inside Snowflake where the data lives, avoiding slow and costly data movement.

Common Mistakes
Trying to download large datasets locally before processing.
This causes slow performance and high network costs.
Use Snowpark to run code inside Snowflake directly on the data.
Not specifying the correct warehouse or schema before running Snowpark code.
Snowflake will not know where to run your code or find your data.
Always connect with the right warehouse, database, and schema settings.
Summary
Connect to Snowflake with SnowSQL specifying account, user, warehouse, database, and schema.
Create and populate a table inside Snowflake to hold your data.
Run Snowpark code in Python to process data directly inside Snowflake without moving it.
This approach saves time and network costs by bringing code to the data.