0
0
Pandasdata~5 mins

Feature engineering basics in Pandas

Choose your learning style9 modes available
Introduction

Feature engineering helps us create new useful data from existing data. This makes it easier for computers to learn and find patterns.

When you want to improve a model's accuracy by adding new information.
When your data is raw and needs cleaning or transforming before analysis.
When you want to combine or split columns to get better insights.
When you want to convert text or dates into numbers for machine learning.
When you want to reduce the number of features but keep important information.
Syntax
Pandas
import pandas as pd

# Create new feature by combining columns
DataFrame['new_feature'] = DataFrame['col1'] + DataFrame['col2']

# Create feature by applying a function
DataFrame['new_feature'] = DataFrame['col'].apply(function)

# Create feature from date column
DataFrame['year'] = DataFrame['date_col'].dt.year

You can create new columns by simple math or functions.

Date columns can be split into year, month, day easily with pandas.

Examples
This example creates a new feature by multiplying age and income.
Pandas
import pandas as pd

data = {'age': [25, 32, 47], 'income': [50000, 60000, 80000]}
df = pd.DataFrame(data)

df['age_income'] = df['age'] * df['income']
This example extracts year and month from a date column.
Pandas
import pandas as pd

data = {'date': pd.to_datetime(['2020-01-01', '2021-06-15'])}
df = pd.DataFrame(data)

df['year'] = df['date'].dt.year

df['month'] = df['date'].dt.month
This example creates a new feature by categorizing income into 'low' or 'high'.
Pandas
import pandas as pd

def categorize_income(x):
    if x < 60000:
        return 'low'
    else:
        return 'high'

data = {'income': [50000, 70000, 40000]}
df = pd.DataFrame(data)

df['income_level'] = df['income'].apply(categorize_income)
Sample Program

This program shows how to create new features by combining columns, extracting date parts, and categorizing numeric data.

Pandas
import pandas as pd

# Sample data
data = {
    'age': [22, 35, 58, 45],
    'salary': [30000, 60000, 80000, 50000],
    'join_date': pd.to_datetime(['2019-01-01', '2018-05-15', '2020-07-30', '2017-12-10'])
}

df = pd.DataFrame(data)

# Create new feature: age times salary
df['age_salary'] = df['age'] * df['salary']

# Extract year from join_date
df['join_year'] = df['join_date'].dt.year

# Categorize salary
 def salary_category(salary):
     return 'High' if salary > 55000 else 'Low'

df['salary_cat'] = df['salary'].apply(salary_category)

print(df)
OutputSuccess
Important Notes

Feature engineering can greatly improve model results but requires understanding your data.

Always check new features for errors or unexpected values.

Try simple features first before complex transformations.

Summary

Feature engineering means creating new useful columns from existing data.

It helps models learn better by giving clearer information.

Common methods include combining columns, extracting date parts, and categorizing values.