0
0
Pandasdata~10 mins

Vectorized operations vs loops in Pandas - Visual Side-by-Side Comparison

Choose your learning style9 modes available
Concept Flow - Vectorized operations vs loops
Start with DataFrame
Choose operation
Use loop
Iterate rows
Update values
Compare speed & simplicity
End
This flow shows two ways to update data: looping row by row or using vectorized operations that work on whole columns at once.
Execution Sample
Pandas
import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3]})

# Loop method
for i in range(len(df)):
    df.loc[i, 'B'] = df.loc[i, 'A'] * 2

# Vectorized method
df['C'] = df['A'] * 2
This code creates a DataFrame and adds two new columns B and C by doubling column A, first using a loop, then vectorized operation.
Execution Table
Stepidf.loc[i, 'A']Actiondf.loc[i, 'B']df['C']
101Loop: Multiply 1 * 22
212Loop: Multiply 2 * 24
323Loop: Multiply 3 * 26
4Vectorized: Multiply entire 'A' by 2[2, 4, 6]
5End of operationsB column filledC column filled
💡 Loop ends after i=2 (last index). Vectorized operation applies to whole column at once.
Variable Tracker
VariableStartAfter 1After 2After 3After vectorizedFinal
iN/A012N/AN/A
df.loc[i, 'B']NaN24666
df['C']NaNNaNNaNNaN[2, 4, 6][2, 4, 6]
Key Moments - 3 Insights
Why is the vectorized operation faster than the loop?
Vectorized operations work on the whole column at once using optimized C code inside pandas, while loops process one row at a time in Python, which is slower. See execution_table rows 1-3 vs 4.
Does the loop method create the new column 'B' immediately?
No, the loop assigns values one by one to 'B'. Initially, 'B' does not exist. After the first iteration, 'B' starts to form. See variable_tracker for 'df.loc[i, B]' changes.
Are the results from loop and vectorized operations the same?
Yes, both methods produce the same doubled values in new columns 'B' and 'C'. Check execution_table columns for 'df.loc[i, B]' and 'df["C"]'.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the value of df.loc[1, 'B'] after step 2?
A4
B2
C6
DNaN
💡 Hint
Check row with Step 2 in execution_table under 'df.loc[i, B]' column.
At which step does the vectorized operation update the entire 'C' column?
AStep 2
BStep 3
CStep 4
DStep 5
💡 Hint
Look for the step mentioning vectorized operation in execution_table.
If we remove the loop and only use vectorized operation, what changes in variable_tracker?
A'df["C"]' becomes NaN
B'df.loc[i, B]' remains NaN throughout
C'df.loc[i, B]' gets filled as in the loop
D'i' variable increments more times
💡 Hint
Variable 'df.loc[i, B]' is only updated inside the loop, see variable_tracker.
Concept Snapshot
Vectorized operations apply functions to whole columns or arrays at once.
Loops process data row by row, which is slower.
Use vectorized methods in pandas for faster, simpler code.
Loops can be used but are less efficient.
Vectorized code is cleaner and leverages pandas optimizations.
Full Transcript
This lesson compares two ways to update data in pandas: using loops and vectorized operations. We start with a DataFrame with one column 'A'. The loop method goes row by row, multiplying each value by 2 and storing it in a new column 'B'. The vectorized method multiplies the entire 'A' column by 2 at once and stores it in 'C'. The execution table shows each step, variable values, and actions. The variable tracker follows how variables change after each step. Key moments clarify why vectorized operations are faster and how columns are created. The quiz tests understanding of values at specific steps and the difference between methods. The snapshot summarizes that vectorized operations are faster and simpler than loops in pandas.