Challenge - 5 Problems

🎖️

Efficiency Mastery in Large Datasets

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of inefficient vs efficient data filtering

Consider a dataset with 1 million numbers. Which code snippet runs faster and why?

Data Analysis Python

import time

numbers = list(range(1_000_000))

start = time.time()
filtered1 = [x for x in numbers if x % 2 == 0]
end = time.time()
print(f"List comprehension time: {end - start:.4f} seconds")

start = time.time()
filtered2 = list(filter(lambda x: x % 2 == 0, numbers))
end = time.time()
print(f"Filter function time: {end - start:.4f} seconds")

ABoth run at the same speed because they do the same operation.

BFilter function is faster because it uses lazy evaluation and processes items one by one.

CList comprehension is faster because it is optimized in Python and avoids function calls.

DFilter function is slower because it creates an intermediate list before filtering.

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Memory usage difference between data structures

Which data structure uses less memory when storing 1 million integers?

Data Analysis Python

import sys

list_data = list(range(1_000_000))
tuple_data = tuple(range(1_000_000))

print(sys.getsizeof(list_data))
print(sys.getsizeof(tuple_data))

ABoth use the same memory because they contain the same elements.

BTuple uses less memory because it is immutable and has less overhead.

CList uses less memory because it is mutable and optimized for storage.

DTuple uses more memory because it stores extra metadata for immutability.

Attempts:

2 left

❓ visualization

advanced

3:00remaining

Visualizing time complexity of sorting algorithms

Which plot correctly shows the time taken by different sorting algorithms as data size increases?

Data Analysis Python

import matplotlib.pyplot as plt
import numpy as np

sizes = np.array([1000, 2000, 4000, 8000, 16000])
quick_sort_times = sizes * np.log2(sizes) * 1e-6
bubble_sort_times = sizes ** 2 * 1e-7

plt.plot(sizes, quick_sort_times, label='Quick Sort')
plt.plot(sizes, bubble_sort_times, label='Bubble Sort')
plt.xlabel('Data Size')
plt.ylabel('Time (seconds)')
plt.title('Sorting Algorithm Time Complexity')
plt.legend()
plt.show()

AQuick Sort time grows slower than Bubble Sort time as data size increases.

BBubble Sort time grows slower than Quick Sort time as data size increases.

CBoth algorithms have the same time growth rate.

DQuick Sort time grows exponentially while Bubble Sort grows linearly.

Attempts:

2 left

🔧 Debug

advanced

2:30remaining

Identify the cause of slow data processing

Why does this code take a long time to run on a large dataset?

Data Analysis Python

data = list(range(10_000_000))
result = []
for x in data:
    if x % 2 == 0:
        result.append(x)

print(len(result))

AUsing a for loop with append is slow; using a list comprehension is faster.

BThe modulo operation is slow and should be avoided.

CThe list 'data' is too large and should be converted to a set first.

DThe print statement causes the delay by printing too many items.

Attempts:

2 left

🚀 Application

expert

3:00remaining

Choosing the best approach for large dataset aggregation

You have a dataset with 10 million records. You want to calculate the average value of a column. Which approach is most efficient?

ALoad all data into a list and use a for loop to sum and count values.

BSort the data first, then calculate the average from the sorted list.

CConvert data to a dictionary and use keys to sum values.

DUse a generator expression to sum values and count without loading all data at once.

Attempts:

2 left