0
0
dbtdata~5 mins

dbt-expectations for data quality - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: dbt-expectations for data quality
O(n)
Understanding Time Complexity

When using dbt-expectations to check data quality, it is important to understand how the time to run these checks changes as data grows.

We want to know how the cost of running these tests scales with the size of the data.

Scenario Under Consideration

Analyze the time complexity of the following dbt-expectations test.


    - name: test_not_null
      config:
        severity: error
      test: not_null
      column_name: user_id
      model: users
    

This test checks that the column user_id in the users table has no missing values.

Identify Repeating Operations

Look at what the test does repeatedly.

  • Primary operation: Scanning each row in the user_id column to check for nulls.
  • How many times: Once for every row in the users table.
How Execution Grows With Input

As the number of rows grows, the test must check more values.

Input Size (n)Approx. Operations
1010 checks for null values
100100 checks for null values
10001000 checks for null values

Pattern observation: The number of checks grows directly with the number of rows.

Final Time Complexity

Time Complexity: O(n)

This means the time to run the test grows in a straight line as the data size increases.

Common Mistake

[X] Wrong: "The test runs instantly no matter how big the data is."

[OK] Correct: The test must look at every row to be sure there are no nulls, so it takes longer with more data.

Interview Connect

Understanding how data quality checks scale helps you write efficient tests and explain their impact clearly in real projects.

Self-Check

"What if we added a test that checks uniqueness of a column instead of nulls? How would the time complexity change?"