Metric definitions and semantic layer in dbt - Time & Space Complexity
When we define metrics in dbt's semantic layer, we want to know how the time to calculate these metrics changes as data grows.
We ask: How does the work needed to compute metrics grow when the data size increases?
Analyze the time complexity of the following dbt metric definition.
metrics:
- name: total_sales
model: ref('orders')
label: "Total Sales"
calculation_method: sum
expression: sales_amount
timestamp: order_date
time_grains:
- day
- month
- year
This metric sums the sales amount from the orders table, grouped by time grains like day or month.
Look at what repeats when calculating this metric.
- Primary operation: Summing sales_amount over rows in the orders table.
- How many times: Once per group (day, month, or year), scanning all rows in that group.
As the number of orders grows, the sum calculation must look at more rows.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 sums |
| 100 | 100 sums |
| 1000 | 1000 sums |
Pattern observation: The work grows directly with the number of rows to sum.
Time Complexity: O(n)
This means the time to compute the metric grows linearly with the number of rows in the data.
[X] Wrong: "Defining a metric means the calculation time stays the same no matter how much data there is."
[OK] Correct: The metric calculation scans data rows, so more data means more work and longer time.
Understanding how metric calculations scale helps you explain data processing efficiency clearly and confidently.
What if we added a filter to the metric to only include orders from the last month? How would the time complexity change?