0
0
dbtdata~5 mins

Documenting models in YAML in dbt - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Documenting models in YAML
O(n)
Understanding Time Complexity

We want to understand how the time it takes to document models in YAML grows as the number of models increases.

How does adding more models affect the work done by dbt when reading documentation?

Scenario Under Consideration

Analyze the time complexity of the following YAML documentation snippet for dbt models.

version: 2
models:
  - name: customers
    description: "Contains customer details"
    columns:
      - name: id
        description: "Unique customer ID"
      - name: name
        description: "Customer full name"
  - name: orders
    description: "Contains order records"
    columns:
      - name: order_id
        description: "Unique order ID"
      - name: customer_id
        description: "ID of the customer who placed the order"

This YAML documents two models, each with columns and descriptions.

Identify Repeating Operations

Look at what repeats when dbt processes this YAML documentation.

  • Primary operation: Reading each model and its columns to build documentation.
  • How many times: Once per model, and once per column inside each model.
How Execution Grows With Input

As the number of models and columns grows, dbt reads more entries.

Input Size (models)Approx. Operations
10Reads about 10 models and their columns
100Reads about 100 models and their columns
1000Reads about 1000 models and their columns

Pattern observation: The work grows roughly in direct proportion to the number of models and columns.

Final Time Complexity

Time Complexity: O(n)

This means the time to process documentation grows linearly with the number of models and columns.

Common Mistake

[X] Wrong: "Adding more models won't affect processing time much because YAML is just text."

[OK] Correct: Even though YAML is text, dbt must read and parse each model and column, so more models mean more work.

Interview Connect

Understanding how processing time grows with input size helps you explain efficiency in real projects, showing you can think about scaling and performance.

Self-Check

"What if we added nested descriptions or tests inside each model? How would the time complexity change?"