Feature engineering helps us create new useful data from existing data. This makes it easier for computers to learn and find patterns.
0
0
Feature engineering basics in Pandas
Introduction
When you want to improve a model's accuracy by adding new information.
When your data is raw and needs cleaning or transforming before analysis.
When you want to combine or split columns to get better insights.
When you want to convert text or dates into numbers for machine learning.
When you want to reduce the number of features but keep important information.
Syntax
Pandas
import pandas as pd # Create new feature by combining columns DataFrame['new_feature'] = DataFrame['col1'] + DataFrame['col2'] # Create feature by applying a function DataFrame['new_feature'] = DataFrame['col'].apply(function) # Create feature from date column DataFrame['year'] = DataFrame['date_col'].dt.year
You can create new columns by simple math or functions.
Date columns can be split into year, month, day easily with pandas.
Examples
This example creates a new feature by multiplying age and income.
Pandas
import pandas as pd data = {'age': [25, 32, 47], 'income': [50000, 60000, 80000]} df = pd.DataFrame(data) df['age_income'] = df['age'] * df['income']
This example extracts year and month from a date column.
Pandas
import pandas as pd data = {'date': pd.to_datetime(['2020-01-01', '2021-06-15'])} df = pd.DataFrame(data) df['year'] = df['date'].dt.year df['month'] = df['date'].dt.month
This example creates a new feature by categorizing income into 'low' or 'high'.
Pandas
import pandas as pd def categorize_income(x): if x < 60000: return 'low' else: return 'high' data = {'income': [50000, 70000, 40000]} df = pd.DataFrame(data) df['income_level'] = df['income'].apply(categorize_income)
Sample Program
This program shows how to create new features by combining columns, extracting date parts, and categorizing numeric data.
Pandas
import pandas as pd # Sample data data = { 'age': [22, 35, 58, 45], 'salary': [30000, 60000, 80000, 50000], 'join_date': pd.to_datetime(['2019-01-01', '2018-05-15', '2020-07-30', '2017-12-10']) } df = pd.DataFrame(data) # Create new feature: age times salary df['age_salary'] = df['age'] * df['salary'] # Extract year from join_date df['join_year'] = df['join_date'].dt.year # Categorize salary def salary_category(salary): return 'High' if salary > 55000 else 'Low' df['salary_cat'] = df['salary'].apply(salary_category) print(df)
OutputSuccess
Important Notes
Feature engineering can greatly improve model results but requires understanding your data.
Always check new features for errors or unexpected values.
Try simple features first before complex transformations.
Summary
Feature engineering means creating new useful columns from existing data.
It helps models learn better by giving clearer information.
Common methods include combining columns, extracting date parts, and categorizing values.