Challenge - 5 Problems

🎖️

Indexing Mastery Badge

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of selecting rows with index labels

What is the output of this code snippet?

import pandas as pd

df = pd.DataFrame({
    'A': [10, 20, 30],
    'B': [100, 200, 300]
}, index=['x', 'y', 'z'])

result = df.loc[['y', 'z'], 'A']
print(result)

Pandas

import pandas as pd

df = pd.DataFrame({
    'A': [10, 20, 30],
    'B': [100, 200, 300]
}, index=['x', 'y', 'z'])

result = df.loc[['y', 'z'], 'A']
print(result)

AKeyError: "['y', 'z'] not in index"

0    20
1    30
Name: A, dtype: int64

x    10
y    20
Name: A, dtype: int64

y    20
z    30
Name: A, dtype: int64

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Number of rows after resetting index

Given this DataFrame, what is the number of rows after resetting the index?

import pandas as pd

df = pd.DataFrame({
    'score': [88, 92, 95],
    'grade': ['B', 'A', 'A']
}, index=['s1', 's2', 's3'])

new_df = df.reset_index()
print(len(new_df))

Pandas

import pandas as pd

df = pd.DataFrame({
    'score': [88, 92, 95],
    'grade': ['B', 'A', 'A']
}, index=['s1', 's2', 's3'])

new_df = df.reset_index()
print(len(new_df))

DKeyError

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Why does this code raise a KeyError?

Consider this code:

import pandas as pd

df = pd.DataFrame({
    'value': [5, 10, 15]
}, index=[1, 2, 3])

print(df.loc[0])

Why does it raise a KeyError?

Pandas

import pandas as pd

df = pd.DataFrame({
    'value': [5, 10, 15]
}, index=[1, 2, 3])

print(df.loc[0])

ABecause the DataFrame has no columns named 0

BBecause loc only accepts column names, not index labels

CBecause 0 is not in the DataFrame's index labels

DBecause loc requires integer positions, not labels

Attempts:

2 left

🚀 Application

advanced

2:30remaining

Selecting rows efficiently with index

You have a large DataFrame with a unique index of user IDs. Which method is the fastest to select multiple users by their IDs?

AUse <code>df.loc[list_of_user_ids]</code> to select rows by index labels

BUse <code>df.iloc[list_of_user_ids]</code> to select rows by integer position

CUse <code>df[df['user_id'].isin(list_of_user_ids)]</code> to filter rows

DUse <code>df.query('user_id in @list_of_user_ids')</code> to filter rows

Attempts:

2 left

🧠 Conceptual

expert

3:00remaining

Why setting an index improves join performance

Why does setting a column as the index improve the performance of joining two DataFrames on that column?

ABecause indexes are implemented as hash tables or trees, allowing faster lookups during joins

BBecause setting an index compresses the data, reducing memory usage during joins

CBecause setting an index sorts the data, and joins require sorted data

DBecause joins only work on index columns, so setting the index is mandatory

Attempts:

2 left