We add or remove columns to change the data we want to analyze. This helps us focus on important information or clean up the data.
0
0
Adding and removing columns in Data Analysis Python
Introduction
You want to create a new column based on existing data, like calculating age from birth year.
You need to remove columns that are not useful or have too many missing values.
You want to add a column to label data, like marking sales as 'high' or 'low'.
You want to simplify the dataset by keeping only relevant columns for your analysis.
Syntax
Data Analysis Python
import pandas as pd # Adding a column df['new_column'] = values # Removing a column df = df.drop('column_name', axis=1)
Use axis=1 to specify you want to drop a column, not a row.
Adding a column can be done by assigning a list, a single value, or a calculation.
Examples
This adds a new column 'C' which is the sum of columns 'A' and 'B'.
Data Analysis Python
import pandas as pd df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) # Add column C as sum of A and B df['C'] = df['A'] + df['B']
This removes the column 'B' from the DataFrame.
Data Analysis Python
import pandas as pd df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) # Remove column B df = df.drop('B', axis=1)
This adds a new column 'constant' with value 10 for every row.
Data Analysis Python
import pandas as pd df = pd.DataFrame({'A': [1, 2]}) # Add a column with the same value for all rows df['constant'] = 10
Sample Program
This program creates a table with names, ages, and salaries. It adds a new column showing age after 5 years. Then it removes the salary column. Finally, it prints the updated table.
Data Analysis Python
import pandas as pd # Create a simple DataFrame df = pd.DataFrame({ 'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'Salary': [50000, 60000, 70000] }) # Add a new column 'Age in 5 years' df['Age in 5 years'] = df['Age'] + 5 # Remove the 'Salary' column df = df.drop('Salary', axis=1) print(df)
OutputSuccess
Important Notes
Dropping columns does not change the original DataFrame unless you assign it back or use inplace=True.
Adding columns with calculations helps create new insights from existing data.
Summary
You add columns to include new information or calculations.
You remove columns to clean or simplify your data.
Use df['new_col'] = ... to add and df.drop(..., axis=1) to remove columns.