Consider two DataFrames merged on a common column. What is the resulting DataFrame?
import pandas as pd df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'val1': [1, 2, 3]}) df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'val2': [4, 5, 6]}) result = pd.merge(df1, df2, on='key', how='inner') print(result)
Inner merge keeps only keys present in both DataFrames.
The inner merge keeps only rows with keys 'B' and 'C' present in both DataFrames. Values from both are combined accordingly.
Two DataFrames are concatenated vertically. How many rows does the result have?
import pandas as pd df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) df2 = pd.DataFrame({'A': [5, 6, 7], 'B': [8, 9, 10]}) result = pd.concat([df1, df2], ignore_index=True) print(len(result))
Count rows in both DataFrames and add them.
df1 has 2 rows, df2 has 3 rows, so concatenation results in 5 rows total.
Examine the code and identify why the merged DataFrame has NaN values in some columns.
import pandas as pd df1 = pd.DataFrame({'id': [1, 2, 3], 'score': [90, 80, 70]}) df2 = pd.DataFrame({'id': [2, 3, 4], 'grade': ['B', 'C', 'D']}) result = pd.merge(df1, df2, on='id', how='left') print(result)
Left merge keeps all rows from the left DataFrame.
Left merge keeps all rows from df1. Rows with id=1 have no match in df2, so 'grade' is NaN for that row.
Given two DataFrames combined by concatenation, which plot correctly shows the combined data distribution?
import pandas as pd import matplotlib.pyplot as plt df1 = pd.DataFrame({'value': [1, 2, 3]}) df2 = pd.DataFrame({'value': [4, 5, 6]}) combined = pd.concat([df1, df2], ignore_index=True) plt.hist(combined['value']) plt.show()
Histogram shows frequency of all combined values.
The histogram shows all values from both DataFrames combined, each value appears once, so bars have height 1.
You have two DataFrames with overlapping and unique keys. You want to combine them so that all keys appear, with matching data where possible. Which merge option achieves this?
import pandas as pd df1 = pd.DataFrame({'key': ['X', 'Y', 'Z'], 'val1': [10, 20, 30]}) df2 = pd.DataFrame({'key': ['Y', 'Z', 'W'], 'val2': [40, 50, 60]}) result = pd.merge(df1, df2, on='key', how=?) print(result)
Think about keeping all keys from both DataFrames.
Outer merge keeps all keys from both DataFrames, filling missing values with NaN where no match exists.