0
0
Snowflakecloud~10 mins

DataFrame API in Snowpark in Snowflake - Step-by-Step Execution

Choose your learning style9 modes available
Process Flow - DataFrame API in Snowpark
Create Snowpark Session
Load Data into DataFrame
Apply Transformations
Execute Actions (collect/show)
Get Results or Write Back
This flow shows how you start a Snowpark session, load data into a DataFrame, transform it, then execute actions to get results or save data.
Execution Sample
Snowflake
session = Session.builder.configs(connection_params).create()
df = session.table('EMPLOYEES')
df_filtered = df.filter(df['DEPARTMENT'] == 'SALES')
df_filtered.show()
This code creates a session, loads the EMPLOYEES table into a DataFrame, filters rows where DEPARTMENT is SALES, then shows the filtered data.
Process Table
StepActionDataFrame StateResult/Output
1Create Snowpark SessionNo DataFrame yetSession object ready
2Load EMPLOYEES table into dfdf contains all EMPLOYEES rowsDataFrame with all rows
3Filter df where DEPARTMENT == 'SALES'df_filtered contains only SALES rowsLazy transformation, no data fetched yet
4Call df_filtered.show()Triggers executionDisplays filtered rows on screen
5Execution completesDataFrame state unchangedFiltered data visible to user
💡 Execution stops after show() displays filtered data; transformations are lazy until action called
Status Tracker
VariableStartAfter Step 2After Step 3After Step 4
sessionnullSession object createdSession object createdSession object created
dfnullDataFrame with all EMPLOYEES rowsDataFrame unchangedDataFrame unchanged
df_filterednullnullDataFrame filtered to SALES rowsDataFrame unchanged
Key Moments - 2 Insights
Why does filtering the DataFrame not immediately show results?
Filtering is a lazy operation; it only builds the plan but does not run it until an action like show() is called, as seen in step 3 and 4 of the execution_table.
What happens when show() is called on a DataFrame?
show() triggers execution of all prior transformations and fetches the data to display, as shown in step 4 where the filtered data is output.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the state of df_filtered after step 3?
AContains only SALES department rows
BEmpty DataFrame
CContains all EMPLOYEES rows
DDataFrame with aggregated data
💡 Hint
Refer to the 'DataFrame State' column in row for step 3 in execution_table
At which step does the DataFrame actually fetch data from Snowflake?
AStep 2: Loading table
BStep 4: Calling show()
CStep 3: Filtering
DStep 1: Creating session
💡 Hint
Check the 'Result/Output' column in execution_table for when data is displayed
If you remove the show() call, what happens to the filtered DataFrame?
ADataFrame becomes empty
BData is fetched and displayed automatically
CNo data is fetched or displayed
DSession closes immediately
💡 Hint
Look at the lazy transformation explanation in key_moments and execution_table step 3
Concept Snapshot
DataFrame API in Snowpark:
- Create a session to connect to Snowflake
- Load data into DataFrame (lazy, no data fetched yet)
- Apply transformations (filter, select, etc.) lazily
- Call actions (show, collect) to execute and fetch data
- Results can be displayed or written back
- Transformations build a plan, actions run it
Full Transcript
In Snowpark, you start by creating a session to connect to Snowflake. Then you load data into a DataFrame, which is like a table in memory but no data is fetched yet. You can apply transformations like filtering rows, but these are lazy and only build a plan. When you call an action like show(), Snowpark runs the plan, fetches the data, and displays the results. This approach helps optimize queries and manage large data efficiently.