Challenge - 5 Problems

🎖️

Index Merge Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ query_result

intermediate

2:00remaining

Output of merging two DataFrames on index with inner join

Given two DataFrames df1 and df2 indexed by id, what is the result of merging them with how='inner' on their indexes?

Pandas

import pandas as pd

df1 = pd.DataFrame({'value1': [10, 20, 30]}, index=['a', 'b', 'c'])
df2 = pd.DataFrame({'value2': [100, 200, 300]}, index=['b', 'c', 'd'])

result = df1.merge(df2, left_index=True, right_index=True, how='inner')
print(result)

A{'b': {'value1': 20, 'value2': 100}, 'c': {'value1': 30, 'value2': 200}}

B{'a': {'value1': 10, 'value2': 100}, 'b': {'value1': 20, 'value2': 200}, 'c': {'value1': 30, 'value2': 300}}

C{'a': {'value1': 10}, 'b': {'value1': 20, 'value2': 100}, 'c': {'value1': 30, 'value2': 200}, 'd': {'value2': 300}}

D{'b': {'value1': 20, 'value2': 200}, 'c': {'value1': 30, 'value2': 300}}

Attempts:

2 left

❓ query_result

intermediate

2:00remaining

Result of merging on index with left join

What is the output of merging df1 and df2 on their indexes using how='left'?

Pandas

import pandas as pd

df1 = pd.DataFrame({'value1': [1, 2, 3]}, index=['x', 'y', 'z'])
df2 = pd.DataFrame({'value2': [10, 20]}, index=['y', 'z'])

result = df1.merge(df2, left_index=True, right_index=True, how='left')
print(result)

A{'x': {'value1': 1, 'value2': None}, 'y': {'value1': 2, 'value2': 10}, 'z': {'value1': 3, 'value2': 20}}

B{'y': {'value1': 2, 'value2': 10}, 'z': {'value1': 3, 'value2': 20}}

C{'x': {'value1': 1}, 'y': {'value1': 2, 'value2': 10}, 'z': {'value1': 3, 'value2': 20}}

D{'x': {'value1': 1, 'value2': 10}, 'y': {'value1': 2, 'value2': 20}, 'z': {'value1': 3, 'value2': None}}

Attempts:

2 left

📝 Syntax

advanced

2:00remaining

Identify the syntax error in merging on index

Which option contains a syntax error when merging two DataFrames on their indexes?

Pandas

import pandas as pd

df1 = pd.DataFrame({'A': [1,2]}, index=['a','b'])
df2 = pd.DataFrame({'B': [3,4]}, index=['a','b'])

Adf1.merge(df2, left_index=True, right_index=True, how='right')

Bdf1.merge(df2, left_index=True right_index=True, how='inner')

Cdf1.merge(df2, left_index=True, right_index=True, how='left')

Ddf1.merge(df2, left_index=True, right_index=True, how='outer')

Attempts:

2 left

❓ optimization

advanced

2:00remaining

Optimizing merge on index for large DataFrames

You have two large DataFrames indexed by the same column. Which approach is fastest to merge them on their indexes?

AConvert indexes to columns, merge, then set index back

BReset indexes on both DataFrames and merge on the index column as a regular column

CUse <code>df1.merge(df2, left_index=True, right_index=True)</code> without resetting indexes

DUse <code>pd.concat([df1, df2], axis=1)</code> without merge

Attempts:

2 left

🧠 Conceptual

expert

3:00remaining

Understanding merge behavior with duplicate indexes

Consider two DataFrames with duplicate indexes. What happens when you merge them on their indexes?

Pandas

import pandas as pd

df1 = pd.DataFrame({'val1': [1,2,3]}, index=['a','a','b'])
df2 = pd.DataFrame({'val2': [10,20]}, index=['a','b'])

result = df1.merge(df2, left_index=True, right_index=True, how='inner')
print(result)

ARows with index 'a' in df1 are matched with the single 'a' in df2, resulting in multiple rows for 'a' in the output

BOnly the first occurrence of 'a' in df1 is matched with df2, others are dropped

CMerge fails with a ValueError due to duplicate indexes

DDuplicates in indexes are ignored and only unique indexes are merged

Attempts:

2 left