0
0
Data Analysis Pythondata~30 mins

Handling duplicate column names in Data Analysis Python - Mini Project: Build & Apply

Choose your learning style9 modes available
Handling duplicate column names
📖 Scenario: You have a table of sales data where some columns have the same name by mistake. This can cause confusion when analyzing the data.Imagine you work in a store and the sales report has two columns both named 'Price'. You want to fix this so each column has a unique name.
🎯 Goal: You will create a data table with duplicate column names, then write code to rename the duplicate columns to unique names by adding numbers at the end.
📋 What You'll Learn
Create a pandas DataFrame with duplicate column names
Create a helper variable to track column name counts
Write code to rename duplicate columns by adding suffixes
Print the final DataFrame columns to show unique names
💡 Why This Matters
🌍 Real World
In real data, duplicate column names can cause errors or confusion. Fixing them helps clean data for analysis or reporting.
💼 Career
Data analysts and scientists often clean messy data. Handling duplicate columns is a common task to prepare data for accurate analysis.
Progress0 / 4 steps
1
Create a DataFrame with duplicate column names
Create a pandas DataFrame called df with these columns: 'Product', 'Price', 'Price', and 'Quantity'. Use these rows: ['Apple', 1.2, 1.3, 10] and ['Banana', 0.5, 0.6, 20].
Data Analysis Python
Hint

Use pd.DataFrame with data and columns parameters.

2
Create a dictionary to count column name occurrences
Create an empty dictionary called col_count to keep track of how many times each column name appears.
Data Analysis Python
Hint

Use {} to create an empty dictionary.

3
Rename duplicate columns to unique names
Create a new list called new_columns. Use a for loop with variable col to go through df.columns. For each col, update col_count[col] by 1 if it exists or set to 1 if not. If col_count[col] is 1, add col to new_columns. Otherwise, add col plus underscore and the count (like Price_2) to new_columns. Finally, set df.columns = new_columns.
Data Analysis Python
Hint

Use a dictionary to count and a list to store new names. Use f"{col}_{count}" to add suffix.

4
Print the DataFrame columns to show unique names
Write print(df.columns.tolist()) to display the list of column names after renaming duplicates.
Data Analysis Python
Hint

Use print(df.columns.tolist()) to see the column names as a list.