Challenge - 5 Problems
Vectorized Operations Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of vectorized addition vs loop addition
Consider a pandas DataFrame with a column of numbers. What is the output of the following code snippets?
Pandas
import pandas as pd df = pd.DataFrame({'A': [1, 2, 3, 4, 5]}) # Vectorized addition result_vectorized = df['A'] + 10 # Loop addition result_loop = [] for x in df['A']: result_loop.append(x + 10) print(result_vectorized.tolist()) print(result_loop)
Attempts:
2 left
💡 Hint
Think about how vectorized operations apply the operation to each element automatically.
✗ Incorrect
Both vectorized addition and the loop add 10 to each element in the column, resulting in the same list of values.
❓ data_output
intermediate1:30remaining
Performance difference in execution time
Which of the following statements correctly describes the typical performance difference between vectorized operations and loops in pandas?
Attempts:
2 left
💡 Hint
Think about how pandas and numpy are implemented under the hood.
✗ Incorrect
Vectorized operations use optimized low-level code, making them faster than explicit Python loops over data.
🔧 Debug
advanced2:00remaining
Identify the error in loop vs vectorized operation
What error will the following code produce and why?
Pandas
import pandas as pd df = pd.DataFrame({'A': [1, 2, 3]}) # Incorrect loop to add 5 result = [] for x in df['A']: result.append(x + '5') print(result)
Attempts:
2 left
💡 Hint
Check the data types involved in the addition inside the loop.
✗ Incorrect
The code tries to add an integer and a string, which is not allowed in Python and raises a TypeError.
❓ visualization
advanced2:30remaining
Visualizing speed difference between vectorized and loop operations
You run the following code to compare execution times. Which plot correctly shows the expected result?
Pandas
import pandas as pd import numpy as np import time import matplotlib.pyplot as plt size = 100000 s = pd.Series(np.arange(size)) start = time.time() result_vec = s + 1 vec_time = time.time() - start start = time.time() result_loop = [] for x in s: result_loop.append(x + 1) loop_time = time.time() - start plt.bar(['Vectorized', 'Loop'], [vec_time, loop_time]) plt.ylabel('Time in seconds') plt.title('Execution time comparison') plt.show()
Attempts:
2 left
💡 Hint
Think about how pandas optimizes vectorized operations compared to Python loops.
✗ Incorrect
Vectorized operations run much faster than loops, so the bar for Vectorized time is much smaller.
🧠 Conceptual
expert3:00remaining
Why vectorized operations are preferred in pandas
Which of the following is NOT a reason why vectorized operations are preferred over loops in pandas?
Attempts:
2 left
💡 Hint
Consider what pandas does internally and what it does not do automatically.
✗ Incorrect
Pandas vectorized operations do not automatically parallelize across CPU cores; parallelism requires extra tools.