0
0
dbtdata~15 mins

Documenting models in YAML in dbt - Deep Dive

Choose your learning style9 modes available
Overview - Documenting models in YAML
What is it?
Documenting models in YAML means writing clear descriptions and details about your data models using a simple text format called YAML. This helps explain what each model does, its columns, and how it fits in the bigger data project. YAML is easy to read and write, making it perfect for sharing information with your team. In dbt, YAML files store this documentation alongside your models.
Why it matters
Without documentation, data models become confusing and hard to use, especially as projects grow or new people join. Documenting models in YAML solves this by making the purpose and structure of data clear and accessible. This saves time, reduces mistakes, and helps everyone trust and understand the data they work with. Imagine trying to use a map without any labels—documentation adds those labels.
Where it fits
Before documenting models, you should understand basic dbt model creation and SQL queries. After learning documentation, you can explore automated testing and data lineage visualization. Documenting models is a key step between building models and ensuring their quality and usability.
Mental Model
Core Idea
Documenting models in YAML is like writing a clear label and instruction sheet for each data model so everyone knows what it is and how to use it.
Think of it like...
It's like putting name tags and descriptions on boxes in a storage room so anyone can find and understand what's inside without opening every box.
┌─────────────────────────────┐
│ dbt Project Folder          │
│ ├─ models/                 │
│ │  ├─ sales.sql            │
│ │  └─ customers.sql        │
│ └─ models.yml (YAML file)  │
│    ├─ models:              │
│    │  ├─ name: sales       │
│    │  │  description: ...  │
│    │  └─ name: customers   │
│    │     description: ...  │
└─────────────────────────────┘
Build-Up - 7 Steps
1
FoundationUnderstanding YAML Basics
🤔
Concept: Learn what YAML is and how its simple structure works for writing data.
YAML stands for 'YAML Ain't Markup Language'. It uses indentation and simple symbols to organize information. For example, lists use dashes (-), and key-value pairs use colons (:). This makes YAML easy to read and write compared to complex formats like XML or JSON.
Result
You can write and read basic YAML files with lists and key-value pairs.
Understanding YAML's simple syntax is essential because it is the foundation for writing clear documentation in dbt.
2
FoundationWhat Are dbt Models?
🤔
Concept: Know what a dbt model is and how it represents a data transformation.
In dbt, a model is a SQL file that defines a table or view in your data warehouse. Models transform raw data into clean, usable tables. Each model has a name, usually the filename without extension, and produces a dataset.
Result
You can identify and create basic dbt models using SQL files.
Knowing what models are helps you understand what you need to document and why.
3
IntermediateStructure of Model Documentation in YAML
🤔Before reading on: do you think YAML documentation for models includes only model names or also details like descriptions and columns? Commit to your answer.
Concept: Learn how to organize model documentation with model names, descriptions, and column details in YAML.
In dbt, the YAML file for documentation has a 'models' section. Each model entry includes the model's name, a description explaining its purpose, and optionally a 'columns' list describing each column's name and meaning. Indentation and dashes organize these details clearly.
Result
You can write a YAML file that documents multiple models with descriptions and column info.
Knowing the structure lets you create documentation that is both human-friendly and machine-readable for dbt.
4
IntermediateLinking YAML Documentation to Models
🤔Before reading on: do you think dbt automatically connects YAML docs to models by filename or requires manual linking? Commit to your answer.
Concept: Understand how dbt matches YAML documentation entries to actual models.
dbt links documentation to models by matching the 'name' field in YAML to the model's filename (without .sql). This means your YAML must use exact model names. When you run dbt docs, it uses this link to show descriptions and column info alongside models.
Result
You can ensure your documentation appears correctly in dbt by matching names.
Knowing this connection prevents broken or missing documentation in your project.
5
IntermediateAdding Descriptions for Columns
🤔
Concept: Learn to document each column inside a model with clear descriptions.
Inside the YAML file, under each model, you add a 'columns' list. Each column has a 'name' and a 'description'. This explains what data the column holds and why it exists. This helps users understand the data without guessing.
Result
Your YAML file includes detailed column-level documentation for models.
Documenting columns improves data transparency and reduces errors in analysis.
6
AdvancedUsing YAML for Tests and Metadata
🤔Before reading on: do you think YAML documentation can also include tests and metadata for models? Commit to your answer.
Concept: Explore how YAML files can include not just descriptions but also tests and metadata for models.
dbt allows you to add tests (like uniqueness or not null) and metadata tags inside the YAML file. This means documentation and quality checks live together. For example, you can specify a test that a column must be unique right in the YAML.
Result
Your YAML file can serve as a single source for documentation and basic data quality rules.
Combining docs and tests in YAML keeps your project organized and easier to maintain.
7
ExpertAdvanced YAML Features and Best Practices
🤔Before reading on: do you think indentation errors in YAML cause silent bugs or clear errors? Commit to your answer.
Concept: Master advanced YAML features like anchors, aliases, and best practices to avoid common pitfalls.
YAML supports anchors (&) and aliases (*) to reuse parts of documentation, reducing repetition. However, YAML is sensitive to indentation; mistakes can cause dbt to fail or misinterpret docs. Best practice includes consistent indentation, comments for clarity, and splitting large docs into multiple files.
Result
You write efficient, error-free YAML documentation that scales with your project.
Understanding YAML's quirks and features prevents frustrating bugs and improves maintainability.
Under the Hood
dbt reads the YAML files during compilation and matches model names in YAML to SQL model files. It parses the YAML structure to extract descriptions, column info, tests, and metadata. This information is then integrated into the documentation website and used for validation. YAML parsing relies on indentation and syntax rules, so any error can break the process.
Why designed this way?
YAML was chosen because it is human-readable and writable, unlike JSON which is more verbose. Keeping documentation in YAML separate from SQL models allows clear separation of code and docs, making maintenance easier. The design balances simplicity with power, enabling both documentation and testing in one place.
┌─────────────┐       ┌───────────────┐       ┌───────────────┐
│ models.sql  │──────▶│ dbt Compiler  │──────▶│ Documentation │
│ (SQL code)  │       │ (reads YAML)  │       │ Website & UI  │
└─────────────┘       └───────────────┘       └───────────────┘
         ▲                     ▲
         │                     │
┌─────────────┐       ┌───────────────┐
│ models.yml  │──────▶│ YAML Parser   │
│ (YAML docs) │       │ (extract info)│
└─────────────┘       └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do you think YAML indentation errors are ignored or cause failures? Commit to your answer.
Common Belief:YAML indentation is not important; small mistakes won't affect dbt documentation.
Tap to reveal reality
Reality:YAML is very sensitive to indentation; even one space off can cause dbt to fail parsing or misinterpret documentation.
Why it matters:Ignoring indentation rules leads to broken documentation or runtime errors, wasting time debugging.
Quick: Do you think you must write documentation inside SQL files or separate YAML files? Commit to your answer.
Common Belief:Documentation should be written inside SQL model files as comments for clarity.
Tap to reveal reality
Reality:dbt uses separate YAML files for documentation to keep code and docs cleanly separated and easier to manage.
Why it matters:Mixing docs and code makes maintenance harder and can clutter SQL files.
Quick: Do you think dbt automatically documents all columns without explicit YAML entries? Commit to your answer.
Common Belief:dbt automatically documents all columns in a model even if not listed in YAML.
Tap to reveal reality
Reality:dbt only shows column documentation if you explicitly add descriptions in YAML; otherwise, columns appear undocumented.
Why it matters:Assuming automatic docs leads to incomplete documentation and confusion for data users.
Quick: Do you think YAML documentation can include tests and metadata? Commit to your answer.
Common Belief:YAML files are only for descriptions, not for tests or metadata.
Tap to reveal reality
Reality:dbt allows tests and metadata to be defined in YAML alongside documentation, enabling integrated quality checks.
Why it matters:Missing this means you lose the chance to centralize docs and tests, reducing project clarity.
Expert Zone
1
YAML anchors and aliases can reduce repetition but are rarely used because they can confuse readers unfamiliar with YAML.
2
The order of models in YAML does not affect dbt, but grouping related models together improves human readability.
3
Descriptions in YAML support markdown formatting, allowing rich text like links and lists in documentation.
When NOT to use
If your project is very small or you prefer inline documentation, you might skip YAML docs and use SQL comments instead. For complex metadata or automated schema management, consider tools like OpenAPI or JSON Schema alongside dbt.
Production Patterns
Teams use YAML documentation combined with dbt's docs generate command to build internal data catalogs. They integrate YAML docs with CI/CD pipelines to ensure documentation is updated and tested automatically before deployment.
Connections
Data Catalogs
Builds-on
Documenting models in YAML is a foundational step toward creating a full data catalog that helps organizations manage and discover data assets.
Software Documentation
Same pattern
Just like documenting code with comments and README files, documenting data models in YAML helps users understand and trust the system.
Technical Writing
Builds-on
Good YAML documentation requires clear, concise writing skills similar to technical writing, improving communication between data engineers and analysts.
Common Pitfalls
#1Indentation errors cause YAML parsing failures.
Wrong approach:models: - name: sales description: 'Sales data model' columns: - name: id description: 'Unique identifier'
Correct approach:models: - name: sales description: 'Sales data model' columns: - name: id description: 'Unique identifier'
Root cause:Misunderstanding YAML's strict indentation rules leads to invalid files.
#2Model names in YAML do not match SQL filenames.
Wrong approach:models: - name: sales_data description: 'Sales model'
Correct approach:models: - name: sales description: 'Sales model'
Root cause:Not realizing dbt links docs by exact model name causes documentation to not appear.
#3Omitting column descriptions leads to incomplete docs.
Wrong approach:models: - name: customers description: 'Customer data' columns: - name: id
Correct approach:models: - name: customers description: 'Customer data' columns: - name: id description: 'Unique customer ID'
Root cause:Assuming column names alone are enough for documentation.
Key Takeaways
Documenting models in YAML makes your data models clear and easy to understand for everyone.
YAML's simple, readable format is perfect for writing descriptions and metadata alongside your dbt models.
Matching model names exactly between YAML and SQL files is crucial for documentation to work.
Including column-level descriptions improves data transparency and reduces errors.
Careful attention to YAML syntax and indentation prevents frustrating errors and broken docs.