0
0
Pandasdata~20 mins

Why combining DataFrames matters in Pandas - Challenge Your Understanding

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
DataFrame Combiner Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this DataFrame merge?

Consider two DataFrames merged on a common column. What is the resulting DataFrame?

Pandas
import pandas as pd

df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'val1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'val2': [4, 5, 6]})
result = pd.merge(df1, df2, on='key', how='inner')
print(result)
A
  key  val1  val2
0   B     2     4
1   C     3     5
B
  key  val1  val2
0   A     1     4
1   B     2     5
2   C     3     6
C
  key  val1  val2
0   B     2     5
1   C     3     6
2   D   NaN     6
D
  key  val1  val2
0   A     1   NaN
1   B     2   NaN
2   C     3   NaN
Attempts:
2 left
💡 Hint

Inner merge keeps only keys present in both DataFrames.

data_output
intermediate
1:30remaining
How many rows after concatenation?

Two DataFrames are concatenated vertically. How many rows does the result have?

Pandas
import pandas as pd

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6, 7], 'B': [8, 9, 10]})
result = pd.concat([df1, df2], ignore_index=True)
print(len(result))
A5
B6
C4
D3
Attempts:
2 left
💡 Hint

Count rows in both DataFrames and add them.

🔧 Debug
advanced
2:30remaining
Why does this merge produce NaN values?

Examine the code and identify why the merged DataFrame has NaN values in some columns.

Pandas
import pandas as pd

df1 = pd.DataFrame({'id': [1, 2, 3], 'score': [90, 80, 70]})
df2 = pd.DataFrame({'id': [2, 3, 4], 'grade': ['B', 'C', 'D']})
result = pd.merge(df1, df2, on='id', how='left')
print(result)
ABecause the 'id' columns have different data types causing merge failure.
BBecause 'how="left"' keeps all rows from df2, missing matches in df1 cause NaN.
CBecause 'how="left"' keeps all rows from df1, missing matches in df2 cause NaN.
DBecause the merge key 'id' is missing in one DataFrame.
Attempts:
2 left
💡 Hint

Left merge keeps all rows from the left DataFrame.

visualization
advanced
3:00remaining
Which plot shows the combined data correctly?

Given two DataFrames combined by concatenation, which plot correctly shows the combined data distribution?

Pandas
import pandas as pd
import matplotlib.pyplot as plt

df1 = pd.DataFrame({'value': [1, 2, 3]})
df2 = pd.DataFrame({'value': [4, 5, 6]})
combined = pd.concat([df1, df2], ignore_index=True)
plt.hist(combined['value'])
plt.show()
AA scatter plot with points only from df1
BA histogram with bars at 1, 2, 3, 4, 5, 6 each with height 1
CA bar chart showing counts of df2 values only
DA line plot connecting points 1 to 6
Attempts:
2 left
💡 Hint

Histogram shows frequency of all combined values.

🚀 Application
expert
3:00remaining
How to combine DataFrames to keep all unique keys?

You have two DataFrames with overlapping and unique keys. You want to combine them so that all keys appear, with matching data where possible. Which merge option achieves this?

Pandas
import pandas as pd

df1 = pd.DataFrame({'key': ['X', 'Y', 'Z'], 'val1': [10, 20, 30]})
df2 = pd.DataFrame({'key': ['Y', 'Z', 'W'], 'val2': [40, 50, 60]})
result = pd.merge(df1, df2, on='key', how=?)
print(result)
A"inner"
B"right"
C"left"
D"outer"
Attempts:
2 left
💡 Hint

Think about keeping all keys from both DataFrames.