0
0
dbtdata~15 mins

Why documentation makes data discoverable in dbt - Why It Works This Way

Choose your learning style9 modes available
Overview - Why documentation makes data discoverable
What is it?
Documentation in data science is the detailed information that explains what data exists, where it comes from, and how it should be used. It helps people understand data assets clearly without guessing. When data is well documented, it becomes easier to find and use correctly. This is especially important in tools like dbt, which manage data transformations and models.
Why it matters
Without documentation, data users waste time searching for the right data or misunderstand its meaning, leading to errors and bad decisions. Documentation makes data discoverable by providing clear descriptions, context, and usage instructions. This saves time, improves trust in data, and helps teams work better together. Imagine trying to cook a recipe without instructions—documentation is like the recipe for data.
Where it fits
Before learning about documentation, you should understand basic data concepts like tables, columns, and data models. After mastering documentation, you can explore data governance, data catalogs, and advanced data lineage tools. Documentation is a bridge between raw data and effective data use.
Mental Model
Core Idea
Documentation acts as a clear map that guides users to find and understand data quickly and correctly.
Think of it like...
Documentation is like labels and instructions on food packages in a supermarket; without them, you wouldn’t know what’s inside or how to use it safely.
┌─────────────────────────────┐
│       Data Assets           │
├─────────────┬───────────────┤
│ Raw Tables  │ Transformed   │
│             │ Models        │
├─────────────┴───────────────┤
│       Documentation         │
│  - Descriptions            │
│  - Sources                 │
│  - Usage Notes             │
└─────────────┬───────────────┘
              │
              ▼
      Data Discoverability
      (Easy to find & use)
Build-Up - 6 Steps
1
FoundationWhat is data documentation
🤔
Concept: Introduction to what data documentation means and its basic components.
Data documentation includes descriptions of data tables, columns, sources, and how data is transformed. It explains what each piece of data means and how it should be used. In dbt, documentation is written alongside data models to keep explanations close to the data itself.
Result
You understand that documentation is more than just notes; it is structured information that explains data clearly.
Understanding that documentation is structured information helps you see it as a tool for communication, not just extra text.
2
FoundationWhy data discoverability matters
🤔
Concept: Explaining the importance of being able to find and understand data easily.
Data discoverability means users can quickly find the right data and know how to use it. Without it, teams waste time guessing or using wrong data. Documentation is the key to making data discoverable by providing clear explanations and context.
Result
You realize that discoverability saves time and reduces mistakes in data work.
Knowing why discoverability matters motivates you to value and create good documentation.
3
IntermediateHow dbt supports documentation
🤔Before reading on: do you think dbt stores documentation separately or with data models? Commit to your answer.
Concept: dbt integrates documentation directly with data models for easy maintenance and access.
In dbt, you write documentation in the same project as your data models using YAML files. This keeps descriptions close to the code that creates the data. dbt can then generate a website showing all documentation, making data easy to explore.
Result
You see how dbt makes documentation part of the data workflow, not an afterthought.
Understanding dbt’s integrated approach shows how documentation stays accurate and up-to-date.
4
IntermediateDocumentation improves data trust
🤔Before reading on: does documentation only help find data, or can it also affect trust? Commit to your answer.
Concept: Clear documentation builds confidence in data quality and meaning.
When data users read detailed documentation, they understand where data comes from and how it was processed. This transparency helps them trust the data and use it correctly. Without documentation, users may doubt data or misuse it.
Result
You appreciate that documentation is not just about finding data but also about trusting it.
Knowing that documentation builds trust helps prioritize it as a critical part of data projects.
5
AdvancedDocumentation as a discovery tool in dbt docs site
🤔Before reading on: do you think documentation websites only show text, or can they help explore data relationships? Commit to your answer.
Concept: dbt’s documentation website provides interactive exploration of data models and their relationships.
dbt generates a docs site that shows tables, columns, descriptions, and how models depend on each other. Users can click through to understand data lineage and context. This interactive site makes discovering data intuitive and visual.
Result
You see documentation as an interactive tool, not just static text.
Understanding the docs site’s interactive nature reveals how documentation can actively guide data exploration.
6
ExpertChallenges and best practices in documentation
🤔Before reading on: do you think documentation is easy to keep updated, or does it often become outdated? Commit to your answer.
Concept: Maintaining accurate documentation requires discipline and automation to avoid decay.
Documentation often becomes outdated if not maintained alongside data changes. dbt encourages writing docs with models and using automated tests to keep data and docs aligned. Best practices include clear writing, regular reviews, and involving the whole team.
Result
You understand that documentation is a living part of data projects needing care and process.
Knowing the challenges of documentation upkeep helps you design workflows that keep data discoverable over time.
Under the Hood
Documentation in dbt is stored as YAML metadata linked to data models and columns. When dbt runs, it reads this metadata and combines it with model definitions to build a searchable, browsable website. This site uses the metadata to show descriptions, sources, and relationships, making data assets easy to find and understand.
Why designed this way?
dbt was designed to keep documentation close to the data transformation code to reduce mismatch and outdated info. Earlier approaches stored docs separately, causing confusion. Integrating docs with models ensures they evolve together, improving accuracy and discoverability.
┌───────────────┐      ┌───────────────┐
│  dbt Models   │─────▶│  YAML Docs    │
│ (SQL files)   │      │ (Descriptions)│
└──────┬────────┘      └──────┬────────┘
       │                      │
       │                      │
       ▼                      ▼
  dbt Compile & Build Docs Site
               │
               ▼
      Interactive Docs Website
               │
               ▼
       Data Discoverability
Myth Busters - 4 Common Misconceptions
Quick: Does documentation only help new users, or is it useful for experts too? Commit to your answer.
Common Belief:Documentation is only for beginners to understand data.
Tap to reveal reality
Reality:Documentation helps everyone, including experts, by saving time and preventing errors.
Why it matters:Ignoring documentation leads to repeated questions and mistakes even among experienced users.
Quick: Is documentation a one-time task or ongoing? Commit to your answer.
Common Belief:Once documentation is written, it doesn’t need updates.
Tap to reveal reality
Reality:Documentation must be updated continuously as data changes to remain useful.
Why it matters:Outdated documentation causes confusion and misuse of data.
Quick: Does documentation only describe data, or can it also show data relationships? Commit to your answer.
Common Belief:Documentation only explains what data is, not how it connects.
Tap to reveal reality
Reality:Good documentation also shows data lineage and relationships, aiding discovery.
Why it matters:Missing relationship info makes it hard to understand data context and impact.
Quick: Can documentation replace data quality checks? Commit to your answer.
Common Belief:If data is documented well, quality checks are less important.
Tap to reveal reality
Reality:Documentation and quality checks serve different purposes; both are essential.
Why it matters:Relying only on docs without checks risks trusting incorrect data.
Expert Zone
1
Documentation quality directly affects data catalog effectiveness and user adoption.
2
Embedding documentation in code (like dbt) reduces drift between data and docs, a common source of errors.
3
Interactive docs sites that show lineage help detect hidden dependencies and impact of changes.
When NOT to use
In very small projects with a single user, heavy documentation may be unnecessary; simple comments or notes suffice. For highly dynamic data where schemas change constantly, automated metadata tools might be better than manual docs.
Production Patterns
Teams integrate documentation writing into their dbt development workflow, using pull requests to update docs alongside code. They use dbt docs sites as a central data catalog and combine it with data quality tests and lineage tools for full data governance.
Connections
Data Catalogs
Documentation is a core part of data catalogs that organize and index data assets.
Understanding documentation helps grasp how data catalogs enable efficient data discovery and governance.
Software Documentation
Both explain complex systems to users, ensuring correct use and maintenance.
Knowing software docs principles improves writing clear, maintainable data documentation.
Library Classification Systems
Like documentation, classification systems organize information to make it findable.
Seeing documentation as an organizational system helps appreciate its role in managing data knowledge.
Common Pitfalls
#1Writing documentation separately from data models.
Wrong approach:Creating a separate Word document to describe data tables without linking to dbt models.
Correct approach:Writing documentation in dbt YAML files alongside the model definitions.
Root cause:Believing documentation is a separate task rather than part of the data development process.
#2Not updating documentation after data changes.
Wrong approach:Changing a dbt model’s SQL but leaving old descriptions in docs unchanged.
Correct approach:Updating the documentation YAML to reflect the new model logic and columns.
Root cause:Underestimating the importance of keeping docs in sync with data.
#3Using vague or technical jargon in documentation.
Wrong approach:Describing a column as 'normalized metric for KPI aggregation' without explanation.
Correct approach:Describing the column as 'A calculated value showing average sales per customer, adjusted for seasonality.'
Root cause:Assuming all users have the same technical background.
Key Takeaways
Documentation is essential to make data easy to find and understand for everyone.
Integrating documentation with data models, like in dbt, keeps information accurate and up-to-date.
Good documentation builds trust in data by explaining its origin, meaning, and transformations.
Interactive documentation sites help users explore data relationships and lineage visually.
Maintaining documentation is an ongoing process that requires discipline and team collaboration.