Consider a dataset with 1 million numbers. Which code snippet runs faster and why?
import time numbers = list(range(1_000_000)) start = time.time() filtered1 = [x for x in numbers if x % 2 == 0] end = time.time() print(f"List comprehension time: {end - start:.4f} seconds") start = time.time() filtered2 = list(filter(lambda x: x % 2 == 0, numbers)) end = time.time() print(f"Filter function time: {end - start:.4f} seconds")
Think about how Python executes list comprehensions versus filter with lambda.
List comprehensions are generally faster than filter with lambda because they avoid the overhead of function calls for each element. This matters when working with large datasets.
Which data structure uses less memory when storing 1 million integers?
import sys list_data = list(range(1_000_000)) tuple_data = tuple(range(1_000_000)) print(sys.getsizeof(list_data)) print(sys.getsizeof(tuple_data))
Consider the difference between mutable and immutable types in Python.
Tuples use less memory than lists because they are immutable and have less overhead. This can be important when handling large datasets.
Which plot correctly shows the time taken by different sorting algorithms as data size increases?
import matplotlib.pyplot as plt import numpy as np sizes = np.array([1000, 2000, 4000, 8000, 16000]) quick_sort_times = sizes * np.log2(sizes) * 1e-6 bubble_sort_times = sizes ** 2 * 1e-7 plt.plot(sizes, quick_sort_times, label='Quick Sort') plt.plot(sizes, bubble_sort_times, label='Bubble Sort') plt.xlabel('Data Size') plt.ylabel('Time (seconds)') plt.title('Sorting Algorithm Time Complexity') plt.legend() plt.show()
Recall the time complexity: Quick Sort is O(n log n), Bubble Sort is O(n²).
Quick Sort is more efficient for large datasets because its time complexity grows slower than Bubble Sort's quadratic growth.
Why does this code take a long time to run on a large dataset?
data = list(range(10_000_000)) result = [] for x in data: if x % 2 == 0: result.append(x) print(len(result))
Think about how Python handles loops and list comprehensions.
Using a for loop with append is slower than a list comprehension because list comprehensions are optimized internally. This affects performance on large datasets.
You have a dataset with 10 million records. You want to calculate the average value of a column. Which approach is most efficient?
Consider memory usage and processing time when handling large data.
Using a generator expression processes data on the fly without loading everything into memory, making it efficient for large datasets.