0
0
Pandasdata~5 mins

Feature engineering basics in Pandas - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is feature engineering in data science?
Feature engineering is the process of creating new input variables or modifying existing ones to improve a machine learning model's performance.
Click to reveal answer
beginner
Why is feature engineering important?
It helps models learn better by providing more meaningful or clearer information, often leading to improved accuracy and insights.
Click to reveal answer
beginner
Name three common feature engineering techniques.
1. Creating new features by combining existing ones (e.g., ratios)<br>2. Encoding categorical variables (e.g., one-hot encoding)<br>3. Scaling or normalizing numerical features
Click to reveal answer
beginner
How can you create a new feature in pandas?
You can create a new column by performing operations on existing columns, for example: df['new_feature'] = df['A'] + df['B'].
Click to reveal answer
beginner
What is one-hot encoding and when do you use it?
One-hot encoding converts categorical variables into multiple binary columns (0 or 1) to help models understand categories as numbers without implying order.
Click to reveal answer
Which of the following is NOT a feature engineering technique?
ACreating new features from existing data
BRemoving missing values
CEncoding categorical variables
DScaling numerical features
In pandas, how do you create a new feature 'total' by adding columns 'A' and 'B'?
Adf['total'] = df['A'] + df['B']
Bdf.total = df.A + df.B
Cdf.add('total', df['A'], df['B'])
Ddf['total'] = df['A'] * df['B']
What does one-hot encoding do?
AConverts numerical data into categories
BScales features between 0 and 1
CTurns categorical variables into binary columns
DRemoves duplicate rows
Why might you create a new feature like 'age_group' from 'age'?
ATo group continuous data into meaningful categories
BTo remove outliers
CTo reduce the number of missing values
DTo encode text data
Which pandas function helps to scale numerical features?
Adf.drop_duplicates()
Bdf.fillna()
Cpd.get_dummies()
Dsklearn.preprocessing.StandardScaler()
Explain what feature engineering is and why it matters in simple terms.
Think about how changing or adding data columns can help a model learn better.
You got /3 concepts.
    Describe how you would create a new feature in a pandas DataFrame using existing columns.
    Remember the syntax for adding a new column in pandas.
    You got /3 concepts.