Documenting models in YAML in dbt - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
We want to understand how the time it takes to document models in YAML grows as the number of models increases.
How does adding more models affect the work done by dbt when reading documentation?
Analyze the time complexity of the following YAML documentation snippet for dbt models.
version: 2
models:
- name: customers
description: "Contains customer details"
columns:
- name: id
description: "Unique customer ID"
- name: name
description: "Customer full name"
- name: orders
description: "Contains order records"
columns:
- name: order_id
description: "Unique order ID"
- name: customer_id
description: "ID of the customer who placed the order"
This YAML documents two models, each with columns and descriptions.
Look at what repeats when dbt processes this YAML documentation.
- Primary operation: Reading each model and its columns to build documentation.
- How many times: Once per model, and once per column inside each model.
As the number of models and columns grows, dbt reads more entries.
| Input Size (models) | Approx. Operations |
|---|---|
| 10 | Reads about 10 models and their columns |
| 100 | Reads about 100 models and their columns |
| 1000 | Reads about 1000 models and their columns |
Pattern observation: The work grows roughly in direct proportion to the number of models and columns.
Time Complexity: O(n)
This means the time to process documentation grows linearly with the number of models and columns.
[X] Wrong: "Adding more models won't affect processing time much because YAML is just text."
[OK] Correct: Even though YAML is text, dbt must read and parse each model and column, so more models mean more work.
Understanding how processing time grows with input size helps you explain efficiency in real projects, showing you can think about scaling and performance.
"What if we added nested descriptions or tests inside each model? How would the time complexity change?"
Practice
Solution
Step 1: Understand the role of YAML documentation
YAML files in dbt are used to add metadata like descriptions, not to run code or store data.Step 2: Identify the benefit of documentation
Adding descriptions for models and columns helps team members understand the data and maintain the project easily.Final Answer:
To add clear descriptions for models and columns to improve understanding -> Option CQuick Check:
Documentation purpose = Add descriptions [OK]
- Thinking YAML runs SQL code
- Confusing YAML with data storage
- Ignoring the importance of descriptions
orders in a YAML file?Solution
Step 1: Recall YAML syntax for dbt model documentation
dbt expects a list undermodels:with each model as a dictionary containingnameanddescription.Step 2: Match the correct structure
models: - name: orders description: 'Contains order details' correctly uses a list with a dictionary havingnameanddescription. Other options misuse keys or structure.Final Answer:
models: - name: orders description: 'Contains order details' -> Option DQuick Check:
Model list with name and description = models: - name: orders description: 'Contains order details' [OK]
- Using singular 'model' instead of 'models'
- Not using dash for list items
- Incorrect indentation or key names
models:
- name: customers
description: 'Customer information'
columns:
- name: id
description: 'Unique customer ID'
- name: email
description: 'Customer email address'
What will dbt show as the description for the email column?Solution
Step 1: Locate the column description in YAML
Theemailcolumn is listed undercolumnswith its owndescriptionkey.Step 2: Identify the description text for the email column
The description foremailis 'Customer email address', which dbt will display for that column.Final Answer:
Customer email address -> Option BQuick Check:
Column description matches YAML text [OK]
- Confusing model description with column description
- Missing indentation causing YAML parsing errors
- Assuming no description if not repeated
models:
- name: sales
description: 'Sales data'
columns:
name: amount
description: 'Sale amount'
What is the error in this YAML?Solution
Step 1: Check YAML list syntax for columns
Each column should be a list item with a dash (-) before its dictionary of keys.Step 2: Identify missing dash in columns
Thenameanddescriptionkeys undercolumnslack the dash, so YAML treats them as keys ofcolumnsinstead of list items.Final Answer:
Missing dash (-) before column name and description -> Option AQuick Check:
List items need dash (-) in YAML [OK]
- Forgetting dash for list items
- Misplacing description keys
- Confusing YAML lists and dictionaries
users and transactions, each with columns and descriptions. Which YAML structure correctly documents both models with their columns?Solution
Step 1: Understand YAML list structure for multiple models
dbt expectsmodelsas a list of dictionaries, each withname,description, andcolumnsas a list.Step 2: Evaluate each option's structure
models: - name: users description: 'User data' columns: - name: user_id description: 'User identifier' - name: transactions description: 'Transaction data' columns: - name: transaction_id description: 'Transaction identifier' correctly uses a list with two model dictionaries, each with proper keys and column lists. The other options misuse keys or structure. models: name: users description: 'User data' columns: - name: user_id description: 'User identifier' name: transactions description: 'Transaction data' columns: - name: transaction_id description: 'Transaction identifier' repeats keys incorrectly.Final Answer:
models: - name: users description: 'User data' columns: - name: user_id description: 'User identifier' - name: transactions description: 'Transaction data' columns: - name: transaction_id description: 'Transaction identifier' -> Option AQuick Check:
Multiple models as list items with name and columns = models: - name: users description: 'User data' columns: - name: user_id description: 'User identifier' - name: transactions description: 'Transaction data' columns: - name: transaction_id description: 'Transaction identifier' [OK]
- Using model names as keys instead of list items
- Repeating keys at same level
- Not using dash for multiple models
