Bird
Raised Fist0
ML Pythonml~8 mins

Date and time feature extraction in ML Python - Model Metrics & Evaluation

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Metrics & Evaluation - Date and time feature extraction
Which metric matters for Date and Time Feature Extraction and WHY

Date and time feature extraction is about turning raw date/time data into useful numbers or categories for a model. The key metric to check here is model performance metrics like accuracy, precision, recall, or RMSE after adding these features. This shows if the extracted features help the model learn better.

Why? Because date/time features themselves don't have a direct metric. Instead, we measure if they improve the model's predictions. For example, extracting "hour of day" or "day of week" might help a sales prediction model. If the model's accuracy or error improves, the features are good.

Confusion Matrix or Equivalent Visualization

For classification tasks using date/time features, a confusion matrix shows how well the model predicts classes.

      Actual \ Predicted |  Positive | Negative
      -------------------|-----------|---------
      Positive           |    TP=50  |   FN=10
      Negative           |    FP=5   |   TN=35
    

Here, TP means the model correctly predicted positive cases using date/time features. FP means it wrongly predicted positive. This matrix helps calculate precision and recall to see if date/time features help reduce errors.

Precision vs Recall Tradeoff with Concrete Examples

Imagine a model predicting if a store will be busy based on time features like "hour" or "holiday".

  • High Precision: The model only says "busy" when very sure. Few false alarms. Good if you want to avoid wasting staff.
  • High Recall: The model catches almost all busy times, even if some false alarms happen. Good if missing busy times is costly.

Choosing which to prioritize depends on the problem. Date/time features help balance this by capturing patterns like rush hours or weekends.

What "Good" vs "Bad" Metric Values Look Like for Date and Time Feature Extraction

Good:

  • Model accuracy or F1 score improves noticeably after adding date/time features.
  • Precision and recall increase, showing better detection of important cases.
  • Errors like RMSE decrease in regression tasks.

Bad:

  • No change or worse model performance after adding date/time features.
  • High false positives or false negatives remain, meaning features don't help.
  • Overfitting signs: model performs well on training but poorly on new data.
Common Metrics Pitfalls
  • Ignoring time leakage: Using future date/time info in training can falsely boost metrics.
  • Accuracy paradox: High accuracy can happen if data is unbalanced (e.g., most days are not busy).
  • Overfitting: Extracting too many date/time features can cause the model to memorize patterns that don't generalize.
  • Not validating on time-based splits: Random splits ignore time order, giving misleading metrics.
Self Check

Your sales prediction model has 85% accuracy but only 40% recall on busy days after adding date/time features. Is it good for production?

Answer: No, because the model misses 60% of busy days (low recall). This means it often fails to predict important busy times, which could hurt staffing or inventory decisions. You should improve recall, maybe by adding or tuning date/time features.

Key Result
Date and time feature extraction is useful if it improves model metrics like accuracy, precision, recall, or error; watch out for time leakage and validate properly.

Practice

(1/5)
1. Which of the following is a common feature extracted from a date to help machine learning models?
easy
A. Font size
B. Color
C. Month
D. Temperature

Solution

  1. Step 1: Understand date features

    Date features include parts of a date like year, month, day, hour, and weekday.
  2. Step 2: Identify relevant feature

    Among the options, only 'Month' is a part of a date and useful for models.
  3. Final Answer:

    Month -> Option C
  4. Quick Check:

    Date feature = Month [OK]
Hint: Pick the option that relates directly to date parts [OK]
Common Mistakes:
  • Choosing unrelated features like color or font size
  • Confusing date features with unrelated data
2. Which Python code correctly extracts the weekday from a pandas datetime column named 'date'?
easy
A. df['weekday'] = df['date'].dt.weekday
B. df['weekday'] = df['date'].weekday()
C. df['weekday'] = df['date'].weekday
D. df['weekday'] = df['date'].dt.weekday()

Solution

  1. Step 1: Recall pandas datetime accessor

    To extract weekday, use the .dt accessor followed by .weekday without parentheses.
  2. Step 2: Check each option

    df['weekday'] = df['date'].dt.weekday uses .dt.weekday correctly. df['weekday'] = df['date'].weekday() calls weekday() directly on the series, which is invalid. df['weekday'] = df['date'].weekday misses .dt. df['weekday'] = df['date'].dt.weekday() incorrectly uses parentheses after .weekday.
  3. Final Answer:

    df['weekday'] = df['date'].dt.weekday -> Option A
  4. Quick Check:

    Use .dt.weekday without parentheses [OK]
Hint: Use .dt.weekday without parentheses for pandas datetime [OK]
Common Mistakes:
  • Calling weekday() as a method on series
  • Missing .dt accessor
  • Adding parentheses after .weekday
3. Given the code:
import pandas as pd
df = pd.DataFrame({'date': pd.to_datetime(['2024-06-01 14:30', '2024-06-02 09:15'])})
df['hour'] = df['date'].dt.hour
df['is_weekend'] = df['date'].dt.weekday >= 5
print(df[['hour', 'is_weekend']].to_dict())

What is the printed output?
medium
A. {'hour': {0: 14, 1: 9}, 'is_weekend': {0: False, 1: False}}
B. {'hour': {0: 14, 1: 9}, 'is_weekend': {0: True, 1: True}}
C. {'hour': {0: 14, 1: 9}, 'is_weekend': {0: False, 1: True}}
D. SyntaxError

Solution

  1. Step 1: Extract hour values

    The first date has hour 14, second has hour 9, so 'hour' column is {0:14, 1:9}.
  2. Step 2: Determine weekend flags

    Weekday 5 and 6 are weekend. Dates are 2024-06-01 (Saturday=5) and 2024-06-02 (Sunday=6). Both are weekend, so 'is_weekend' should be True for both.
  3. Step 3: Check code logic

    Code uses df['date'].dt.weekday >= 5, which is True for both dates. So 'is_weekend' is {0: True, 1: True}.
  4. Final Answer:

    {'hour': {0: 14, 1: 9}, 'is_weekend': {0: True, 1: True}} -> Option B
  5. Quick Check:

    Weekend days are 5 or 6, both dates match [OK]
Hint: Check weekday numbers: 5=Saturday, 6=Sunday for weekend [OK]
Common Mistakes:
  • Assuming weekend is false for Saturday/Sunday
  • Mixing hour extraction with weekend logic
  • Misreading weekday numbers
4. The following code aims to add a 'month' feature from a datetime column but throws an error:
df['month'] = df['date'].month

What is the error and how to fix it?
medium
A. AttributeError because .month must be accessed via .dt; fix: df['date'].dt.month
B. SyntaxError due to missing parentheses; fix: df['date'].month()
C. TypeError because 'date' is not datetime; fix: convert to datetime first
D. No error; code is correct

Solution

  1. Step 1: Understand pandas datetime access

    Datetime properties like month must be accessed with .dt when working on a pandas Series.
  2. Step 2: Identify error cause

    Using df['date'].month tries to get 'month' attribute of the Series, causing AttributeError.
  3. Step 3: Correct code

    Use df['date'].dt.month to extract month correctly.
  4. Final Answer:

    AttributeError because .month must be accessed via .dt; fix: df['date'].dt.month -> Option A
  5. Quick Check:

    Use .dt.month for pandas datetime columns [OK]
Hint: Always use .dt before datetime properties on pandas Series [OK]
Common Mistakes:
  • Missing .dt accessor
  • Trying to call .month() as a method
  • Not converting column to datetime type
5. You have a dataset with a datetime column 'timestamp'. You want to create a feature that is 1 if the time is during business hours (9am to 5pm) on weekdays, else 0. Which code correctly creates this feature?
hard
A. df['business_hours'] = ((df['timestamp'].dt.hour > 9) & (df['timestamp'].dt.hour <= 17) & (df['timestamp'].dt.weekday <= 5)).astype(int)
B. df['business_hours'] = ((df['timestamp'].dt.hour > 9) & (df['timestamp'].dt.hour < 17) & (df['timestamp'].dt.weekday < 5)).astype(int)
C. df['business_hours'] = ((df['timestamp'].dt.hour >= 9) & (df['timestamp'].dt.hour <= 17) & (df['timestamp'].dt.weekday <= 5)).astype(int)
D. df['business_hours'] = ((df['timestamp'].dt.hour >= 9) & (df['timestamp'].dt.hour < 17) & (df['timestamp'].dt.weekday < 5)).astype(int)

Solution

  1. Step 1: Define business hours range

    Business hours are from 9:00 (inclusive) to 17:00 (exclusive), so hour >= 9 and hour < 17.
  2. Step 2: Define weekdays

    Weekdays are Monday (0) to Friday (4), so weekday < 5.
  3. Step 3: Combine conditions and convert to int

    Use logical AND (&) to combine conditions and convert boolean to int with .astype(int).
  4. Final Answer:

    df['business_hours'] = ((df['timestamp'].dt.hour >= 9) & (df['timestamp'].dt.hour < 17) & (df['timestamp'].dt.weekday < 5)).astype(int) -> Option D
  5. Quick Check:

    Use inclusive start, exclusive end for hours and weekday < 5 [OK]
Hint: Use >=9 and <17 for hours, weekday <5 for Mon-Fri [OK]
Common Mistakes:
  • Using >9 instead of >=9
  • Including weekend days by using <=5
  • Using <=17 instead of <17