Bird
Raised Fist0
MLOpsdevops~10 mins

Point-in-time correctness in MLOps - Interactive Code Practice

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to load a model version at a specific timestamp.

MLOps
model = mlflow.pyfunc.load_model(model_uri='models:/my_model@[1]')
Drag options to blanks, or click blank then click option'
Alatest
B2023-05-01T12:00:00Z
Cproduction
Dstaging
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'latest' loads the newest model, not the one at a specific time.
2fill in blank
medium

Complete the code to query data as it existed at a specific point in time.

MLOps
query = "SELECT * FROM feature_store WHERE event_time <= '[1]'"
Drag options to blanks, or click blank then click option'
Alatest
Bnow()
C2023-04-30T23:59:59Z
D2023-06-01
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'now()' includes current data, breaking point-in-time correctness.
3fill in blank
hard

Fix the error in the code to ensure point-in-time correctness when loading features.

MLOps
features = feature_store.get_features(entity_id=123, as_of_time=[1])
Drag options to blanks, or click blank then click option'
A'2023-05-15T10:00:00Z'
B'latest'
Cdatetime.now()
DNone
Attempts:
3 left
💡 Hint
Common Mistakes
Using datetime.now() loads current features, breaking point-in-time correctness.
4fill in blank
hard

Fill both blanks to create a feature retrieval query that respects point-in-time correctness.

MLOps
SELECT feature_value FROM features WHERE entity_id = [1] AND event_timestamp <= '[2]'
Drag options to blanks, or click blank then click option'
A12345
B2023-05-10T08:30:00Z
C67890
D2023-06-01T00:00:00Z
Attempts:
3 left
💡 Hint
Common Mistakes
Using a future timestamp or wrong entity_id breaks point-in-time correctness.
5fill in blank
hard

Fill all three blanks to build a dictionary comprehension that filters features for point-in-time correctness.

MLOps
filtered_features = {k: v for k, v in features.items() if v['timestamp'] [1] '[2]' and k != [3]
Drag options to blanks, or click blank then click option'
A<=
B2023-05-20T15:00:00Z
C'deprecated_feature'
D==
Attempts:
3 left
💡 Hint
Common Mistakes
Using '==' instead of '!=' excludes the wrong features.

Practice

(1/5)
1.

What does point-in-time correctness ensure in MLOps?

easy
A. Using all available data including future data for better accuracy
B. Ignoring timestamps in data processing
C. Using only data available up to a specific moment to avoid future data leaks
D. Using random data samples without time consideration

Solution

  1. Step 1: Understand the concept of point-in-time correctness

    It means using data only up to a certain moment to avoid using future information.
  2. Step 2: Identify the correct practice

    Using future data can cause wrong model results, so only past and present data should be used.
  3. Final Answer:

    Using only data available up to a specific moment to avoid future data leaks -> Option C
  4. Quick Check:

    Point-in-time correctness = Use past data only [OK]
Hint: Remember: no peeking into future data for training [OK]
Common Mistakes:
  • Using future data accidentally
  • Ignoring timestamps in data
  • Assuming all data is valid regardless of time
2.

Which of the following is the correct way to filter data for point-in-time correctness using SQL?

SELECT * FROM sales WHERE sale_date <= '2023-01-01'
easy
A. SELECT * FROM sales WHERE sale_date <= '2023-01-01'
B. SELECT * FROM sales WHERE sale_date > '2023-01-01'
C. SELECT * FROM sales WHERE sale_date = '2023-01-01'
D. SELECT * FROM sales WHERE sale_date >= '2023-01-01'

Solution

  1. Step 1: Understand filtering for point-in-time correctness

    We want data up to and including the date '2023-01-01'.
  2. Step 2: Choose the correct SQL condition

    The condition should be sale_date less than or equal to '2023-01-01' to include all past data.
  3. Final Answer:

    SELECT * FROM sales WHERE sale_date <= '2023-01-01' -> Option A
  4. Quick Check:

    Use <= for up to a date [OK]
Hint: Use <= to include data up to the cutoff date [OK]
Common Mistakes:
  • Using > instead of <=
  • Filtering only exact date instead of all past data
  • Using >= which includes future data
3.

Given the following Python code snippet for filtering data by timestamp, what will be the output?

data = [
  {'id': 1, 'timestamp': '2023-01-01'},
  {'id': 2, 'timestamp': '2023-02-01'},
  {'id': 3, 'timestamp': '2022-12-31'}
]
cutoff = '2023-01-01'
filtered = [d['id'] for d in data if d['timestamp'] <= cutoff]
print(filtered)
medium
A. [3]
B. [1, 2, 3]
C. [2]
D. [1, 3]

Solution

  1. Step 1: Analyze the filtering condition

    We keep items where timestamp is less than or equal to '2023-01-01'.
  2. Step 2: Check each item

    Item 1: '2023-01-01' <= '2023-01-01' (True), Item 2: '2023-02-01' <= '2023-01-01' (False), Item 3: '2022-12-31' <= '2023-01-01' (True).
  3. Final Answer:

    [1, 3] -> Option D
  4. Quick Check:

    Filter by <= cutoff date = [1, 3] [OK]
Hint: Compare timestamps as strings for ISO format dates [OK]
Common Mistakes:
  • Including future dates mistakenly
  • Confusing < and <=
  • Ignoring date format in comparison
4.

Identify the error in this code snippet that tries to enforce point-in-time correctness:

def filter_data(data, cutoff):
    return [d for d in data if d['timestamp'] > cutoff]

# cutoff = '2023-01-01'
medium
A. The list comprehension syntax is incorrect
B. The comparison should be <= cutoff, not > cutoff
C. The cutoff variable is not defined
D. The function should return all data without filtering

Solution

  1. Step 1: Understand the filtering logic

    Point-in-time correctness requires data up to the cutoff date, so timestamps should be less than or equal to cutoff.
  2. Step 2: Identify the error in comparison

    The code uses > cutoff, which selects future data instead of past data.
  3. Final Answer:

    The comparison should be <= cutoff, not > cutoff -> Option B
  4. Quick Check:

    Use <= cutoff to filter past data [OK]
Hint: Filter with <= cutoff, not > cutoff [OK]
Common Mistakes:
  • Using > instead of <=
  • Ignoring cutoff definition
  • Incorrect list comprehension syntax
5.

You have a dataset with multiple features collected over time. You want to create a feature store snapshot that guarantees point-in-time correctness for model training on 2023-03-01. Which approach is best?

hard
A. Filter all features to include only data with timestamps <= '2023-03-01' and save as snapshot
B. Include data with timestamps > '2023-03-01' to improve model accuracy
C. Use the latest data available regardless of timestamp
D. Randomly sample data without considering timestamps

Solution

  1. Step 1: Understand snapshot purpose

    A snapshot should represent data exactly as it was up to the training date to avoid future data leaks.
  2. Step 2: Choose filtering strategy

    Filtering all features with timestamps less than or equal to '2023-03-01' ensures point-in-time correctness.
  3. Step 3: Save filtered data as snapshot

    This snapshot can be used safely for training without future data contamination.
  4. Final Answer:

    Filter all features to include only data with timestamps <= '2023-03-01' and save as snapshot -> Option A
  5. Quick Check:

    Snapshot = Filter by cutoff date [OK]
Hint: Snapshot = data filtered by cutoff timestamp [OK]
Common Mistakes:
  • Using future data in snapshot
  • Ignoring timestamp filtering
  • Random sampling without time consideration