Challenge - 5 Problems

🎖️

DataFrame Combiner Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

What is the output of this DataFrame merge?

Consider two DataFrames merged on a common column. What is the resulting DataFrame?

Pandas

import pandas as pd

df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'val1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'val2': [4, 5, 6]})
result = pd.merge(df1, df2, on='key', how='inner')
print(result)

  key  val1  val2
0   B     2     4
1   C     3     5

  key  val1  val2
0   A     1     4
1   B     2     5
2   C     3     6

  key  val1  val2
0   B     2     5
1   C     3     6
2   D   NaN     6

  key  val1  val2
0   A     1   NaN
1   B     2   NaN
2   C     3   NaN

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

How many rows after concatenation?

Two DataFrames are concatenated vertically. How many rows does the result have?

Pandas

import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6, 7], 'B': [8, 9, 10]})
result = pd.concat([df1, df2], ignore_index=True)
print(len(result))

Attempts:

2 left

🔧 Debug

advanced

2:30remaining

Why does this merge produce NaN values?

Examine the code and identify why the merged DataFrame has NaN values in some columns.

Pandas

import pandas as pd

df1 = pd.DataFrame({'id': [1, 2, 3], 'score': [90, 80, 70]})
df2 = pd.DataFrame({'id': [2, 3, 4], 'grade': ['B', 'C', 'D']})
result = pd.merge(df1, df2, on='id', how='left')
print(result)

ABecause the 'id' columns have different data types causing merge failure.

BBecause 'how="left"' keeps all rows from df2, missing matches in df1 cause NaN.

CBecause 'how="left"' keeps all rows from df1, missing matches in df2 cause NaN.

DBecause the merge key 'id' is missing in one DataFrame.

Attempts:

2 left

❓ visualization

advanced

3:00remaining

Which plot shows the combined data correctly?

Given two DataFrames combined by concatenation, which plot correctly shows the combined data distribution?

Pandas

import pandas as pd
import matplotlib.pyplot as plt

df1 = pd.DataFrame({'value': [1, 2, 3]})
df2 = pd.DataFrame({'value': [4, 5, 6]})
combined = pd.concat([df1, df2], ignore_index=True)
plt.hist(combined['value'])
plt.show()

AA scatter plot with points only from df1

BA histogram with bars at 1, 2, 3, 4, 5, 6 each with height 1

CA bar chart showing counts of df2 values only

DA line plot connecting points 1 to 6

Attempts:

2 left

🚀 Application

expert

3:00remaining

How to combine DataFrames to keep all unique keys?

You have two DataFrames with overlapping and unique keys. You want to combine them so that all keys appear, with matching data where possible. Which merge option achieves this?

Pandas

import pandas as pd

df1 = pd.DataFrame({'key': ['X', 'Y', 'Z'], 'val1': [10, 20, 30]})
df2 = pd.DataFrame({'key': ['Y', 'Z', 'W'], 'val2': [40, 50, 60]})
result = pd.merge(df1, df2, on='key', how=?)
print(result)

A"inner"

B"right"

C"left"

D"outer"

Attempts:

2 left