Challenge - 5 Problems
Duplicate Column Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of DataFrame with duplicate columns after selection
What is the output of the following code snippet when selecting column 'A' from a DataFrame with duplicate column names?
Data Analysis Python
import pandas as pd df = pd.DataFrame({"A": [1, 2], "B": [3, 4], "A": [5, 6]}) result = df["A"] print(result)
Attempts:
2 left
💡 Hint
When duplicate column names exist, the last one overwrites previous ones in dictionary initialization.
✗ Incorrect
In pandas, when creating a DataFrame from a dictionary with duplicate keys, the last key-value pair is used. So the column 'A' contains [5, 6]. Selecting df['A'] returns that column as a Series.
❓ data_output
intermediate2:00remaining
Number of columns after reading CSV with duplicate headers
Given a CSV file with headers: 'X,Y,X', what will be the number of columns in the DataFrame after reading it with pandas default settings?
Data Analysis Python
import pandas as pd from io import StringIO csv_data = "X,Y,X\n1,2,3\n4,5,6" df = pd.read_csv(StringIO(csv_data)) print(len(df.columns))
Attempts:
2 left
💡 Hint
pandas allows duplicate column names by default when reading CSV files.
✗ Incorrect
pandas does not automatically rename duplicate columns when reading CSV files. It keeps all columns, so the DataFrame has 3 columns.
🔧 Debug
advanced2:00remaining
Identify error when accessing duplicate columns by attribute
What error occurs when trying to access a duplicate column by attribute in pandas DataFrame?
Data Analysis Python
import pandas as pd df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "A"]) print(df.A)
Attempts:
2 left
💡 Hint
Attribute access raises AttributeError when duplicate column names exist.
✗ Incorrect
In pandas, when duplicate columns exist, attribute access (df.A) raises AttributeError because it requires unique column names.
🚀 Application
advanced2:00remaining
Resolving duplicate columns after concatenation
After concatenating two DataFrames with overlapping column names, which method correctly renames duplicate columns to unique names?
Data Analysis Python
import pandas as pd df1 = pd.DataFrame({"A": [1], "B": [2]}) df2 = pd.DataFrame({"A": [3], "B": [4]}) df_concat = pd.concat([df1, df2], axis=1) # Which code renames duplicates correctly?
Attempts:
2 left
💡 Hint
Use enumeration to append index to each column name for uniqueness.
✗ Incorrect
Option C appends an index to each column name, making all column names unique. Other options either remove duplicates or cause errors.
🧠 Conceptual
expert3:00remaining
Effect of duplicate columns on groupby aggregation
If a DataFrame has duplicate column names and you perform a groupby aggregation on one of these columns, what is the expected behavior?
Attempts:
2 left
💡 Hint
pandas treats columns as labels; duplicate names mean multiple columns with same label.
✗ Incorrect
When grouping by a column name that appears multiple times, pandas applies aggregation to all columns with that name, resulting in multiple aggregated columns.