Bird
Raised Fist0
dbtdata~15 mins

Documenting models in YAML in dbt - Deep Dive

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Overview - Documenting models in YAML
What is it?
Documenting models in YAML means writing clear descriptions and details about your data models using a simple text format called YAML. This helps explain what each model does, its columns, and how it fits in the bigger data project. YAML is easy to read and write, making it perfect for sharing information with your team. In dbt, YAML files store this documentation alongside your models.
Why it matters
Without documentation, data models become confusing and hard to use, especially as projects grow or new people join. Documenting models in YAML solves this by making the purpose and structure of data clear and accessible. This saves time, reduces mistakes, and helps everyone trust and understand the data they work with. Imagine trying to use a map without any labels—documentation adds those labels.
Where it fits
Before documenting models, you should understand basic dbt model creation and SQL queries. After learning documentation, you can explore automated testing and data lineage visualization. Documenting models is a key step between building models and ensuring their quality and usability.
Mental Model
Core Idea
Documenting models in YAML is like writing a clear label and instruction sheet for each data model so everyone knows what it is and how to use it.
Think of it like...
It's like putting name tags and descriptions on boxes in a storage room so anyone can find and understand what's inside without opening every box.
┌─────────────────────────────┐
│ dbt Project Folder          │
│ ├─ models/                 │
│ │  ├─ sales.sql            │
│ │  └─ customers.sql        │
│ └─ models.yml (YAML file)  │
│    ├─ models:              │
│    │  ├─ name: sales       │
│    │  │  description: ...  │
│    │  └─ name: customers   │
│    │     description: ...  │
└─────────────────────────────┘
Build-Up - 7 Steps
1
FoundationUnderstanding YAML Basics
🤔
Concept: Learn what YAML is and how its simple structure works for writing data.
YAML stands for 'YAML Ain't Markup Language'. It uses indentation and simple symbols to organize information. For example, lists use dashes (-), and key-value pairs use colons (:). This makes YAML easy to read and write compared to complex formats like XML or JSON.
Result
You can write and read basic YAML files with lists and key-value pairs.
Understanding YAML's simple syntax is essential because it is the foundation for writing clear documentation in dbt.
2
FoundationWhat Are dbt Models?
🤔
Concept: Know what a dbt model is and how it represents a data transformation.
In dbt, a model is a SQL file that defines a table or view in your data warehouse. Models transform raw data into clean, usable tables. Each model has a name, usually the filename without extension, and produces a dataset.
Result
You can identify and create basic dbt models using SQL files.
Knowing what models are helps you understand what you need to document and why.
3
IntermediateStructure of Model Documentation in YAML
🤔Before reading on: do you think YAML documentation for models includes only model names or also details like descriptions and columns? Commit to your answer.
Concept: Learn how to organize model documentation with model names, descriptions, and column details in YAML.
In dbt, the YAML file for documentation has a 'models' section. Each model entry includes the model's name, a description explaining its purpose, and optionally a 'columns' list describing each column's name and meaning. Indentation and dashes organize these details clearly.
Result
You can write a YAML file that documents multiple models with descriptions and column info.
Knowing the structure lets you create documentation that is both human-friendly and machine-readable for dbt.
4
IntermediateLinking YAML Documentation to Models
🤔Before reading on: do you think dbt automatically connects YAML docs to models by filename or requires manual linking? Commit to your answer.
Concept: Understand how dbt matches YAML documentation entries to actual models.
dbt links documentation to models by matching the 'name' field in YAML to the model's filename (without .sql). This means your YAML must use exact model names. When you run dbt docs, it uses this link to show descriptions and column info alongside models.
Result
You can ensure your documentation appears correctly in dbt by matching names.
Knowing this connection prevents broken or missing documentation in your project.
5
IntermediateAdding Descriptions for Columns
🤔
Concept: Learn to document each column inside a model with clear descriptions.
Inside the YAML file, under each model, you add a 'columns' list. Each column has a 'name' and a 'description'. This explains what data the column holds and why it exists. This helps users understand the data without guessing.
Result
Your YAML file includes detailed column-level documentation for models.
Documenting columns improves data transparency and reduces errors in analysis.
6
AdvancedUsing YAML for Tests and Metadata
🤔Before reading on: do you think YAML documentation can also include tests and metadata for models? Commit to your answer.
Concept: Explore how YAML files can include not just descriptions but also tests and metadata for models.
dbt allows you to add tests (like uniqueness or not null) and metadata tags inside the YAML file. This means documentation and quality checks live together. For example, you can specify a test that a column must be unique right in the YAML.
Result
Your YAML file can serve as a single source for documentation and basic data quality rules.
Combining docs and tests in YAML keeps your project organized and easier to maintain.
7
ExpertAdvanced YAML Features and Best Practices
🤔Before reading on: do you think indentation errors in YAML cause silent bugs or clear errors? Commit to your answer.
Concept: Master advanced YAML features like anchors, aliases, and best practices to avoid common pitfalls.
YAML supports anchors (&) and aliases (*) to reuse parts of documentation, reducing repetition. However, YAML is sensitive to indentation; mistakes can cause dbt to fail or misinterpret docs. Best practice includes consistent indentation, comments for clarity, and splitting large docs into multiple files.
Result
You write efficient, error-free YAML documentation that scales with your project.
Understanding YAML's quirks and features prevents frustrating bugs and improves maintainability.
Under the Hood
dbt reads the YAML files during compilation and matches model names in YAML to SQL model files. It parses the YAML structure to extract descriptions, column info, tests, and metadata. This information is then integrated into the documentation website and used for validation. YAML parsing relies on indentation and syntax rules, so any error can break the process.
Why designed this way?
YAML was chosen because it is human-readable and writable, unlike JSON which is more verbose. Keeping documentation in YAML separate from SQL models allows clear separation of code and docs, making maintenance easier. The design balances simplicity with power, enabling both documentation and testing in one place.
┌─────────────┐       ┌───────────────┐       ┌───────────────┐
│ models.sql  │──────▶│ dbt Compiler  │──────▶│ Documentation │
│ (SQL code)  │       │ (reads YAML)  │       │ Website & UI  │
└─────────────┘       └───────────────┘       └───────────────┘
         ▲                     ▲
         │                     │
┌─────────────┐       ┌───────────────┐
│ models.yml  │──────▶│ YAML Parser   │
│ (YAML docs) │       │ (extract info)│
└─────────────┘       └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do you think YAML indentation errors are ignored or cause failures? Commit to your answer.
Common Belief:YAML indentation is not important; small mistakes won't affect dbt documentation.
Tap to reveal reality
Reality:YAML is very sensitive to indentation; even one space off can cause dbt to fail parsing or misinterpret documentation.
Why it matters:Ignoring indentation rules leads to broken documentation or runtime errors, wasting time debugging.
Quick: Do you think you must write documentation inside SQL files or separate YAML files? Commit to your answer.
Common Belief:Documentation should be written inside SQL model files as comments for clarity.
Tap to reveal reality
Reality:dbt uses separate YAML files for documentation to keep code and docs cleanly separated and easier to manage.
Why it matters:Mixing docs and code makes maintenance harder and can clutter SQL files.
Quick: Do you think dbt automatically documents all columns without explicit YAML entries? Commit to your answer.
Common Belief:dbt automatically documents all columns in a model even if not listed in YAML.
Tap to reveal reality
Reality:dbt only shows column documentation if you explicitly add descriptions in YAML; otherwise, columns appear undocumented.
Why it matters:Assuming automatic docs leads to incomplete documentation and confusion for data users.
Quick: Do you think YAML documentation can include tests and metadata? Commit to your answer.
Common Belief:YAML files are only for descriptions, not for tests or metadata.
Tap to reveal reality
Reality:dbt allows tests and metadata to be defined in YAML alongside documentation, enabling integrated quality checks.
Why it matters:Missing this means you lose the chance to centralize docs and tests, reducing project clarity.
Expert Zone
1
YAML anchors and aliases can reduce repetition but are rarely used because they can confuse readers unfamiliar with YAML.
2
The order of models in YAML does not affect dbt, but grouping related models together improves human readability.
3
Descriptions in YAML support markdown formatting, allowing rich text like links and lists in documentation.
When NOT to use
If your project is very small or you prefer inline documentation, you might skip YAML docs and use SQL comments instead. For complex metadata or automated schema management, consider tools like OpenAPI or JSON Schema alongside dbt.
Production Patterns
Teams use YAML documentation combined with dbt's docs generate command to build internal data catalogs. They integrate YAML docs with CI/CD pipelines to ensure documentation is updated and tested automatically before deployment.
Connections
Data Catalogs
Builds-on
Documenting models in YAML is a foundational step toward creating a full data catalog that helps organizations manage and discover data assets.
Software Documentation
Same pattern
Just like documenting code with comments and README files, documenting data models in YAML helps users understand and trust the system.
Technical Writing
Builds-on
Good YAML documentation requires clear, concise writing skills similar to technical writing, improving communication between data engineers and analysts.
Common Pitfalls
#1Indentation errors cause YAML parsing failures.
Wrong approach:models: - name: sales description: 'Sales data model' columns: - name: id description: 'Unique identifier'
Correct approach:models: - name: sales description: 'Sales data model' columns: - name: id description: 'Unique identifier'
Root cause:Misunderstanding YAML's strict indentation rules leads to invalid files.
#2Model names in YAML do not match SQL filenames.
Wrong approach:models: - name: sales_data description: 'Sales model'
Correct approach:models: - name: sales description: 'Sales model'
Root cause:Not realizing dbt links docs by exact model name causes documentation to not appear.
#3Omitting column descriptions leads to incomplete docs.
Wrong approach:models: - name: customers description: 'Customer data' columns: - name: id
Correct approach:models: - name: customers description: 'Customer data' columns: - name: id description: 'Unique customer ID'
Root cause:Assuming column names alone are enough for documentation.
Key Takeaways
Documenting models in YAML makes your data models clear and easy to understand for everyone.
YAML's simple, readable format is perfect for writing descriptions and metadata alongside your dbt models.
Matching model names exactly between YAML and SQL files is crucial for documentation to work.
Including column-level descriptions improves data transparency and reduces errors.
Careful attention to YAML syntax and indentation prevents frustrating errors and broken docs.

Practice

(1/5)
1. What is the main purpose of documenting models in YAML in a dbt project?
easy
A. To write SQL queries inside YAML files
B. To execute dbt models automatically
C. To add clear descriptions for models and columns to improve understanding
D. To store raw data files

Solution

  1. Step 1: Understand the role of YAML documentation

    YAML files in dbt are used to add metadata like descriptions, not to run code or store data.
  2. Step 2: Identify the benefit of documentation

    Adding descriptions for models and columns helps team members understand the data and maintain the project easily.
  3. Final Answer:

    To add clear descriptions for models and columns to improve understanding -> Option C
  4. Quick Check:

    Documentation purpose = Add descriptions [OK]
Hint: Documentation in YAML means adding descriptions, not code [OK]
Common Mistakes:
  • Thinking YAML runs SQL code
  • Confusing YAML with data storage
  • Ignoring the importance of descriptions
2. Which of the following is the correct way to start documenting a model named orders in a YAML file?
easy
A. models: orders description: 'Contains order details'
B. model: name: orders description: 'Contains order details'
C. models: - orders: description: 'Contains order details'
D. models: - name: orders description: 'Contains order details'

Solution

  1. Step 1: Recall YAML syntax for dbt model documentation

    dbt expects a list under models: with each model as a dictionary containing name and description.
  2. Step 2: Match the correct structure

    models: - name: orders description: 'Contains order details' correctly uses a list with a dictionary having name and description. Other options misuse keys or structure.
  3. Final Answer:

    models: - name: orders description: 'Contains order details' -> Option D
  4. Quick Check:

    Model list with name and description = models: - name: orders description: 'Contains order details' [OK]
Hint: Use dash (-) for list items under models in YAML [OK]
Common Mistakes:
  • Using singular 'model' instead of 'models'
  • Not using dash for list items
  • Incorrect indentation or key names
3. Given this YAML snippet documenting a model and its columns:
models:
  - name: customers
    description: 'Customer information'
    columns:
      - name: id
        description: 'Unique customer ID'
      - name: email
        description: 'Customer email address'
What will dbt show as the description for the email column?
medium
A. Unique customer ID
B. Customer email address
C. Customer information
D. No description

Solution

  1. Step 1: Locate the column description in YAML

    The email column is listed under columns with its own description key.
  2. Step 2: Identify the description text for the email column

    The description for email is 'Customer email address', which dbt will display for that column.
  3. Final Answer:

    Customer email address -> Option B
  4. Quick Check:

    Column description matches YAML text [OK]
Hint: Column descriptions are under columns > name in YAML [OK]
Common Mistakes:
  • Confusing model description with column description
  • Missing indentation causing YAML parsing errors
  • Assuming no description if not repeated
4. You wrote this YAML to document a model but dbt throws an error:
models:
  - name: sales
    description: 'Sales data'
    columns:
      name: amount
      description: 'Sale amount'
What is the error in this YAML?
medium
A. Missing dash (-) before column name and description
B. Incorrect model name key
C. Description should be under models, not columns
D. YAML does not support nested lists

Solution

  1. Step 1: Check YAML list syntax for columns

    Each column should be a list item with a dash (-) before its dictionary of keys.
  2. Step 2: Identify missing dash in columns

    The name and description keys under columns lack the dash, so YAML treats them as keys of columns instead of list items.
  3. Final Answer:

    Missing dash (-) before column name and description -> Option A
  4. Quick Check:

    List items need dash (-) in YAML [OK]
Hint: Use dash (-) before each column in columns list [OK]
Common Mistakes:
  • Forgetting dash for list items
  • Misplacing description keys
  • Confusing YAML lists and dictionaries
5. You want to document two models, users and transactions, each with columns and descriptions. Which YAML structure correctly documents both models with their columns?
hard
A. models: - name: users description: 'User data' columns: - name: user_id description: 'User identifier' - name: transactions description: 'Transaction data' columns: - name: transaction_id description: 'Transaction identifier'
B. models: users: description: 'User data' columns: user_id: 'User identifier' transactions: description: 'Transaction data' columns: transaction_id: 'Transaction identifier'
C. models: - users: description: 'User data' columns: - user_id: 'User identifier' - transactions: description: 'Transaction data' columns: - transaction_id: 'Transaction identifier'
D. models: name: users description: 'User data' columns: - name: user_id description: 'User identifier' name: transactions description: 'Transaction data' columns: - name: transaction_id description: 'Transaction identifier'

Solution

  1. Step 1: Understand YAML list structure for multiple models

    dbt expects models as a list of dictionaries, each with name, description, and columns as a list.
  2. Step 2: Evaluate each option's structure

    models: - name: users description: 'User data' columns: - name: user_id description: 'User identifier' - name: transactions description: 'Transaction data' columns: - name: transaction_id description: 'Transaction identifier' correctly uses a list with two model dictionaries, each with proper keys and column lists. The other options misuse keys or structure. models: name: users description: 'User data' columns: - name: user_id description: 'User identifier' name: transactions description: 'Transaction data' columns: - name: transaction_id description: 'Transaction identifier' repeats keys incorrectly.
  3. Final Answer:

    models: - name: users description: 'User data' columns: - name: user_id description: 'User identifier' - name: transactions description: 'Transaction data' columns: - name: transaction_id description: 'Transaction identifier' -> Option A
  4. Quick Check:

    Multiple models as list items with name and columns = models: - name: users description: 'User data' columns: - name: user_id description: 'User identifier' - name: transactions description: 'Transaction data' columns: - name: transaction_id description: 'Transaction identifier' [OK]
Hint: List each model with dash (-) and include columns as lists [OK]
Common Mistakes:
  • Using model names as keys instead of list items
  • Repeating keys at same level
  • Not using dash for multiple models