Bird
Raised Fist0
dbtdata~10 mins

Why documentation makes data discoverable in dbt - Visual Breakdown

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Concept Flow - Why documentation makes data discoverable
Start: Data exists in dbt models
Add documentation to models
Generate docs site with dbt docs
Users search and explore docs
Users find data definitions and lineage
Data becomes discoverable and trusted
This flow shows how adding documentation to dbt models leads to generating a docs site, which users explore to find and trust data.
Execution Sample
dbt
models/sales.sql:
-- docs: Sales data by region
select * from raw.sales

# Run: dbt docs generate
# Run: dbt docs serve
This code adds a comment as documentation in a dbt model, then generates and serves the documentation site.
Execution Table
StepActionInputOutputEffect
1Create dbt model with doc commentSQL file with -- docs commentModel file savedDocumentation added to model metadata
2Run 'dbt docs generate'Model files with docsDocumentation site filesDocs site created with model info
3Run 'dbt docs serve'Docs site filesLocal web server runningDocs site accessible in browser
4User opens docs siteBrowser requestDocs homepage loadsUser sees searchable data docs
5User searches for 'sales'Search inputFiltered docs pageUser finds sales data details
6User views data lineageClick lineage tabLineage graph shownUser understands data source and flow
7EndUser satisfiedData discoveredData is discoverable and trusted
💡 Process ends when user finds and understands data through documentation site.
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4After Step 5After Step 6Final
DocumentationNoneAdded to modelIncluded in docs site filesDocs site runningDocs site loadedSearch results filteredLineage graph displayedData discoverable
Key Moments - 3 Insights
Why do we need to run 'dbt docs generate' after adding documentation?
Because 'dbt docs generate' reads the model files and creates the documentation site files that include the new docs. Without this step, the docs site won't update. See execution_table step 2.
How does the documentation help users find data?
The documentation site provides searchable descriptions and lineage, so users can search keywords and see where data comes from. This is shown in execution_table steps 4 and 5.
What happens if we don't add documentation comments in models?
The docs site will have minimal info, making it hard for users to understand or trust data. Documentation is key to making data discoverable, as shown by the difference between step 1 and later steps.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the output after running 'dbt docs generate'?
ADocumentation site files
BModel files saved
CLocal web server running
DDocs homepage loads
💡 Hint
Check execution_table step 2 output column.
At which step does the user first interact with the documentation site in a browser?
AStep 3
BStep 4
CStep 5
DStep 6
💡 Hint
Look for 'User opens docs site' in execution_table.
If documentation comments are missing in models, how does it affect the variable 'Documentation' in variable_tracker?
AIt shows 'Docs site running'
BIt becomes 'Added to model' anyway
CIt stays 'None' throughout
DIt shows 'Search results filtered'
💡 Hint
Refer to variable_tracker 'Documentation' row start and after step 1.
Concept Snapshot
Why documentation makes data discoverable:
- Add docs comments in dbt models
- Run 'dbt docs generate' to build docs site
- Run 'dbt docs serve' to view site locally
- Users search and explore docs to find data
- Documentation shows data definitions and lineage
- This makes data easier to find and trust
Full Transcript
This visual execution shows how adding documentation to dbt models leads to generating a documentation site that users can explore. First, a doc comment is added to a model file. Then, running 'dbt docs generate' creates the documentation site files. Running 'dbt docs serve' starts a local web server to view the docs. Users open the site in a browser, search for data like 'sales', and view data lineage. This process makes data discoverable and trusted. Key moments include understanding why generating docs is needed after adding comments, how docs help users find data, and the impact of missing documentation. The quizzes test understanding of these steps and variable changes. The snapshot summarizes the key steps to make data discoverable with documentation in dbt.

Practice

(1/5)
1. Why is documentation important in dbt projects for data discoverability?
easy
A. It speeds up the data processing time.
B. It explains data clearly so users can find and understand it easily.
C. It automatically fixes errors in data models.
D. It encrypts data for security.

Solution

  1. Step 1: Understand the purpose of documentation in dbt

    Documentation provides clear explanations about data models and columns.
  2. Step 2: Connect documentation to data discoverability

    Clear explanations help users find and understand data easily, improving discoverability.
  3. Final Answer:

    It explains data clearly so users can find and understand it easily. -> Option B
  4. Quick Check:

    Documentation improves discoverability [OK]
Hint: Documentation means clear explanations for easy data finding [OK]
Common Mistakes:
  • Confusing documentation with data processing speed
  • Thinking documentation fixes data errors automatically
  • Assuming documentation encrypts data
2. Which of the following is the correct way to add a description to a dbt model in YAML?
easy
A. models: - name: sales description: 'Contains sales data by region'
B. models: name: sales description: 'Contains sales data by region'
C. model: - name: sales description: 'Contains sales data by region'
D. models: - sales: description: 'Contains sales data by region'

Solution

  1. Step 1: Recall YAML structure for dbt model descriptions

    The correct syntax uses 'models:' followed by a list with '- name:' and 'description:' keys.
  2. Step 2: Identify the option matching this structure

    models: - name: sales description: 'Contains sales data by region' correctly uses a list item with 'name' and 'description' under 'models'.
  3. Final Answer:

    models:\n - name: sales\n description: 'Contains sales data by region' -> Option A
  4. Quick Check:

    Correct YAML list syntax [OK]
Hint: YAML lists use dash and indentation for model descriptions [OK]
Common Mistakes:
  • Missing dash for list items
  • Using singular 'model' instead of 'models'
  • Incorrect indentation breaking YAML format
3. Given this YAML snippet in a dbt model file:
models:
  - name: customers
    description: 'Customer details including name and email'
  - name: orders
    description: 'Order records with dates and amounts'
What will dbt documentation show for the 'orders' model?
medium
A. Error loading description
B. Customer details including name and email
C. Order records with dates and amounts
D. No description available

Solution

  1. Step 1: Locate the 'orders' model in the YAML snippet

    The 'orders' model is listed with a description: 'Order records with dates and amounts'.
  2. Step 2: Understand dbt documentation usage

    dbt uses the description text to show model info in docs.
  3. Final Answer:

    Order records with dates and amounts -> Option C
  4. Quick Check:

    Model description matches YAML text [OK]
Hint: Match model name to its description in YAML [OK]
Common Mistakes:
  • Mixing descriptions between models
  • Assuming missing description means error
  • Confusing model names
4. You wrote this YAML for a dbt model description but the docs show no description:
models:
  name: products
  description: 'Product catalog details'
What is the likely error?
medium
A. Missing dash (-) before 'name' to define list item
B. Incorrect key 'description' instead of 'desc'
C. YAML does not support descriptions
D. Model name should be uppercase

Solution

  1. Step 1: Check YAML list syntax for models

    dbt expects 'models:' followed by a list indicated by '-'. Missing dash means no list item.
  2. Step 2: Identify the missing dash before 'name'

    Without '-', YAML treats 'name' as a key under 'models', not a list item, so description is ignored.
  3. Final Answer:

    Missing dash (-) before 'name' to define list item -> Option A
  4. Quick Check:

    Dash defines list items in YAML [OK]
Hint: Always use dash for list items in YAML [OK]
Common Mistakes:
  • Using wrong key names
  • Thinking YAML disallows descriptions
  • Ignoring YAML indentation rules
5. You want to improve data discoverability by adding descriptions to columns in a dbt model. Which YAML snippet correctly documents the 'customer_id' column with a description?
hard
A. models: - name: customers columns: - name: customer_id desc: 'Unique ID for each customer'
B. models: - name: customers columns: customer_id: 'Unique ID for each customer'
C. models: - name: customers columns: - customer_id: 'Unique ID for each customer'
D. models: - name: customers columns: - name: customer_id description: 'Unique ID for each customer'

Solution

  1. Step 1: Recall correct YAML structure for column documentation in dbt

    Columns are listed as items with '- name:' and 'description:' keys.
  2. Step 2: Identify the option matching this structure

    models: - name: customers columns: - name: customer_id description: 'Unique ID for each customer' correctly uses '- name: customer_id' and 'description' key.
  3. Final Answer:

    models:\n - name: customers\n columns:\n - name: customer_id\n description: 'Unique ID for each customer' -> Option D
  4. Quick Check:

    Correct column description syntax [OK]
Hint: Use '- name:' and 'description:' for columns in YAML [OK]
Common Mistakes:
  • Using key-value pairs without dash for columns
  • Using 'desc' instead of 'description'
  • Incorrect indentation breaking YAML