Bird
Raised Fist0
ML Pythonml~5 mins

Date and time feature extraction in ML Python - Cheat Sheet & Quick Revision

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is date and time feature extraction in machine learning?
It is the process of taking raw date and time data and turning it into useful pieces of information (features) that a machine learning model can understand and use to make predictions.
Click to reveal answer
beginner
Name three common features extracted from a date-time value.
Year, month, day of the week, hour, minute, and second are common features extracted from date-time values.
Click to reveal answer
beginner
Why might extracting the day of the week be useful for a model?
Because some patterns depend on the day, like sales being higher on weekends or traffic being heavier on weekdays.
Click to reveal answer
intermediate
How can you extract the hour from a timestamp in Python using pandas?
Use pandas' dt accessor: df['timestamp'].dt.hour extracts the hour part from a datetime column.
Click to reveal answer
intermediate
What is the benefit of extracting cyclical features like hour or month as sine and cosine values?
It helps the model understand that these features repeat in cycles, for example, hour 23 is close to hour 0, which normal numbers don't show well.
Click to reveal answer
Which feature is NOT typically extracted from a datetime value?
ADay of the week
BHour of the day
CUser's age
DMonth
Why do we sometimes convert hour or month into sine and cosine values?
ATo capture the cyclical nature of time
BTo make the data categorical
CTo reduce the number of features
DTo normalize the data between 0 and 1
Which Python library is commonly used to extract date and time features?
ANumPy
Bpandas
Cmatplotlib
Dscikit-learn
What does df['date'].dt.weekday return in pandas?
AThe day of the week as an integer (Monday=0)
BThe day of the month
CThe week number in the year
DThe month number
Which feature might help predict traffic patterns best?
AYear
BMicrosecond
CSecond
DDay of the week
Explain how you would extract useful features from a timestamp for a machine learning model.
Think about breaking down the timestamp into parts and representing repeating patterns.
You got /7 concepts.
    Why is it important to transform cyclical time features like hour or month before using them in models?
    Consider how time repeats and how numbers alone might mislead the model.
    You got /4 concepts.

      Practice

      (1/5)
      1. Which of the following is a common feature extracted from a date to help machine learning models?
      easy
      A. Font size
      B. Color
      C. Month
      D. Temperature

      Solution

      1. Step 1: Understand date features

        Date features include parts of a date like year, month, day, hour, and weekday.
      2. Step 2: Identify relevant feature

        Among the options, only 'Month' is a part of a date and useful for models.
      3. Final Answer:

        Month -> Option C
      4. Quick Check:

        Date feature = Month [OK]
      Hint: Pick the option that relates directly to date parts [OK]
      Common Mistakes:
      • Choosing unrelated features like color or font size
      • Confusing date features with unrelated data
      2. Which Python code correctly extracts the weekday from a pandas datetime column named 'date'?
      easy
      A. df['weekday'] = df['date'].dt.weekday
      B. df['weekday'] = df['date'].weekday()
      C. df['weekday'] = df['date'].weekday
      D. df['weekday'] = df['date'].dt.weekday()

      Solution

      1. Step 1: Recall pandas datetime accessor

        To extract weekday, use the .dt accessor followed by .weekday without parentheses.
      2. Step 2: Check each option

        df['weekday'] = df['date'].dt.weekday uses .dt.weekday correctly. df['weekday'] = df['date'].weekday() calls weekday() directly on the series, which is invalid. df['weekday'] = df['date'].weekday misses .dt. df['weekday'] = df['date'].dt.weekday() incorrectly uses parentheses after .weekday.
      3. Final Answer:

        df['weekday'] = df['date'].dt.weekday -> Option A
      4. Quick Check:

        Use .dt.weekday without parentheses [OK]
      Hint: Use .dt.weekday without parentheses for pandas datetime [OK]
      Common Mistakes:
      • Calling weekday() as a method on series
      • Missing .dt accessor
      • Adding parentheses after .weekday
      3. Given the code:
      import pandas as pd
      df = pd.DataFrame({'date': pd.to_datetime(['2024-06-01 14:30', '2024-06-02 09:15'])})
      df['hour'] = df['date'].dt.hour
      df['is_weekend'] = df['date'].dt.weekday >= 5
      print(df[['hour', 'is_weekend']].to_dict())

      What is the printed output?
      medium
      A. {'hour': {0: 14, 1: 9}, 'is_weekend': {0: False, 1: False}}
      B. {'hour': {0: 14, 1: 9}, 'is_weekend': {0: True, 1: True}}
      C. {'hour': {0: 14, 1: 9}, 'is_weekend': {0: False, 1: True}}
      D. SyntaxError

      Solution

      1. Step 1: Extract hour values

        The first date has hour 14, second has hour 9, so 'hour' column is {0:14, 1:9}.
      2. Step 2: Determine weekend flags

        Weekday 5 and 6 are weekend. Dates are 2024-06-01 (Saturday=5) and 2024-06-02 (Sunday=6). Both are weekend, so 'is_weekend' should be True for both.
      3. Step 3: Check code logic

        Code uses df['date'].dt.weekday >= 5, which is True for both dates. So 'is_weekend' is {0: True, 1: True}.
      4. Final Answer:

        {'hour': {0: 14, 1: 9}, 'is_weekend': {0: True, 1: True}} -> Option B
      5. Quick Check:

        Weekend days are 5 or 6, both dates match [OK]
      Hint: Check weekday numbers: 5=Saturday, 6=Sunday for weekend [OK]
      Common Mistakes:
      • Assuming weekend is false for Saturday/Sunday
      • Mixing hour extraction with weekend logic
      • Misreading weekday numbers
      4. The following code aims to add a 'month' feature from a datetime column but throws an error:
      df['month'] = df['date'].month

      What is the error and how to fix it?
      medium
      A. AttributeError because .month must be accessed via .dt; fix: df['date'].dt.month
      B. SyntaxError due to missing parentheses; fix: df['date'].month()
      C. TypeError because 'date' is not datetime; fix: convert to datetime first
      D. No error; code is correct

      Solution

      1. Step 1: Understand pandas datetime access

        Datetime properties like month must be accessed with .dt when working on a pandas Series.
      2. Step 2: Identify error cause

        Using df['date'].month tries to get 'month' attribute of the Series, causing AttributeError.
      3. Step 3: Correct code

        Use df['date'].dt.month to extract month correctly.
      4. Final Answer:

        AttributeError because .month must be accessed via .dt; fix: df['date'].dt.month -> Option A
      5. Quick Check:

        Use .dt.month for pandas datetime columns [OK]
      Hint: Always use .dt before datetime properties on pandas Series [OK]
      Common Mistakes:
      • Missing .dt accessor
      • Trying to call .month() as a method
      • Not converting column to datetime type
      5. You have a dataset with a datetime column 'timestamp'. You want to create a feature that is 1 if the time is during business hours (9am to 5pm) on weekdays, else 0. Which code correctly creates this feature?
      hard
      A. df['business_hours'] = ((df['timestamp'].dt.hour > 9) & (df['timestamp'].dt.hour <= 17) & (df['timestamp'].dt.weekday <= 5)).astype(int)
      B. df['business_hours'] = ((df['timestamp'].dt.hour > 9) & (df['timestamp'].dt.hour < 17) & (df['timestamp'].dt.weekday < 5)).astype(int)
      C. df['business_hours'] = ((df['timestamp'].dt.hour >= 9) & (df['timestamp'].dt.hour <= 17) & (df['timestamp'].dt.weekday <= 5)).astype(int)
      D. df['business_hours'] = ((df['timestamp'].dt.hour >= 9) & (df['timestamp'].dt.hour < 17) & (df['timestamp'].dt.weekday < 5)).astype(int)

      Solution

      1. Step 1: Define business hours range

        Business hours are from 9:00 (inclusive) to 17:00 (exclusive), so hour >= 9 and hour < 17.
      2. Step 2: Define weekdays

        Weekdays are Monday (0) to Friday (4), so weekday < 5.
      3. Step 3: Combine conditions and convert to int

        Use logical AND (&) to combine conditions and convert boolean to int with .astype(int).
      4. Final Answer:

        df['business_hours'] = ((df['timestamp'].dt.hour >= 9) & (df['timestamp'].dt.hour < 17) & (df['timestamp'].dt.weekday < 5)).astype(int) -> Option D
      5. Quick Check:

        Use inclusive start, exclusive end for hours and weekday < 5 [OK]
      Hint: Use >=9 and <17 for hours, weekday <5 for Mon-Fri [OK]
      Common Mistakes:
      • Using >9 instead of >=9
      • Including weekend days by using <=5
      • Using <=17 instead of <17