What is the output of this code snippet?
import pandas as pd
df = pd.DataFrame({
'A': [10, 20, 30],
'B': [100, 200, 300]
}, index=['x', 'y', 'z'])
result = df.loc[['y', 'z'], 'A']
print(result)import pandas as pd df = pd.DataFrame({ 'A': [10, 20, 30], 'B': [100, 200, 300] }, index=['x', 'y', 'z']) result = df.loc[['y', 'z'], 'A'] print(result)
Remember that loc uses the index labels, not integer positions.
The loc method selects rows by their index labels. Since the DataFrame has index labels 'x', 'y', 'z', selecting ['y', 'z'] returns the rows with those labels. The column 'A' values for 'y' and 'z' are 20 and 30 respectively.
Given this DataFrame, what is the number of rows after resetting the index?
import pandas as pd
df = pd.DataFrame({
'score': [88, 92, 95],
'grade': ['B', 'A', 'A']
}, index=['s1', 's2', 's3'])
new_df = df.reset_index()
print(len(new_df))import pandas as pd df = pd.DataFrame({ 'score': [88, 92, 95], 'grade': ['B', 'A', 'A'] }, index=['s1', 's2', 's3']) new_df = df.reset_index() print(len(new_df))
Resetting the index adds the old index as a column but does not change the number of rows.
The reset_index() method moves the index into a column and creates a new default integer index. The number of rows remains the same, which is 3.
Consider this code:
import pandas as pd
df = pd.DataFrame({
'value': [5, 10, 15]
}, index=[1, 2, 3])
print(df.loc[0])Why does it raise a KeyError?
import pandas as pd df = pd.DataFrame({ 'value': [5, 10, 15] }, index=[1, 2, 3]) print(df.loc[0])
Check the index labels of the DataFrame and what loc expects.
The DataFrame's index labels are [1, 2, 3]. Using loc[0] tries to access the row with label 0, which does not exist, causing a KeyError.
You have a large DataFrame with a unique index of user IDs. Which method is the fastest to select multiple users by their IDs?
Think about how pandas uses the index for fast lookups.
Using df.loc[list_of_user_ids] leverages the index for fast direct access. The other methods involve filtering or positional indexing, which are slower for large data.
Why does setting a column as the index improve the performance of joining two DataFrames on that column?
Think about how indexes help find matching rows quickly.
Indexes in pandas are often implemented using hash tables or balanced trees, which allow quick lookup of matching keys during joins. This speeds up the join operation compared to scanning all rows.