Bird
Raised Fist0
dbtdata~15 mins

Column descriptions in dbt - Deep Dive

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Overview - Column descriptions
What is it?
Column descriptions are short explanations attached to each column in a database table or model. They tell you what the data in that column means in simple words. In dbt, these descriptions help document your data models clearly. This makes it easier for anyone using the data to understand what each column represents.
Why it matters
Without column descriptions, users often guess what data means, leading to mistakes and wasted time. Clear descriptions prevent confusion and errors when analyzing data. They also help teams share knowledge and maintain data quality as projects grow. This makes data trustworthy and easier to use for decision-making.
Where it fits
Before learning column descriptions, you should understand basic database tables and dbt models. After mastering descriptions, you can explore advanced data documentation, testing, and data cataloging tools. Column descriptions fit into the documentation and data governance part of the data workflow.
Mental Model
Core Idea
Column descriptions are simple labels that explain what each piece of data means, making data easy to understand and use.
Think of it like...
Think of column descriptions like labels on jars in a kitchen pantry. Without labels, you might confuse sugar with salt. The labels tell you exactly what's inside each jar so you can cook correctly.
┌───────────────┐
│ Table: sales  │
├───────────────┤
│ Column        │ Description               │
├───────────────┼───────────────────────────┤
│ order_id      │ Unique ID for each order  │
│ customer_name │ Name of the customer      │
│ order_date    │ Date when order was placed│
│ total_amount  │ Total price of the order  │
└───────────────┴───────────────────────────┘
Build-Up - 6 Steps
1
FoundationWhat are column descriptions
🤔
Concept: Introducing the idea of attaching simple explanations to each column in a data table.
A column description is a short text that explains what data is stored in that column. For example, a column named 'age' might have the description 'Age of the customer in years'. This helps anyone reading the data understand what the column means without guessing.
Result
You can clearly explain what each column in your data represents.
Understanding that data needs clear explanations prevents confusion and errors when sharing or analyzing data.
2
FoundationHow dbt stores column descriptions
🤔
Concept: Learning where and how dbt keeps these descriptions in your project.
In dbt, column descriptions are written inside the model's schema.yml file. Each model lists its columns, and each column has a description field. This way, descriptions are version-controlled with your code and easy to update.
Result
You know where to write and find column descriptions in dbt projects.
Keeping descriptions with code ensures documentation stays up-to-date and consistent.
3
IntermediateWriting clear and useful descriptions
🤔Before reading on: Do you think descriptions should be very technical or simple and clear? Commit to your answer.
Concept: How to write descriptions that anyone can understand and that add real value.
Good descriptions are short, clear, and avoid jargon. They explain what the data means, not how it is calculated. For example, instead of 'Sum of sales transactions', write 'Total sales amount for the order'. Avoid repeating the column name or being too vague.
Result
Descriptions become helpful guides for anyone using the data.
Knowing how to write clear descriptions improves communication and reduces mistakes in data use.
4
IntermediateUsing descriptions for auto-generated docs
🤔Before reading on: Do you think descriptions affect only code or also user-facing documentation? Commit to your answer.
Concept: Descriptions in dbt automatically appear in the generated documentation website.
When you run 'dbt docs generate' and 'dbt docs serve', dbt creates a website showing your models and columns with their descriptions. This makes it easy for data users to explore and understand your data without reading code.
Result
Your documentation website shows helpful column explanations.
Understanding this connection motivates keeping descriptions accurate and complete.
5
AdvancedMaintaining descriptions in large projects
🤔Before reading on: Do you think descriptions are a one-time task or need ongoing updates? Commit to your answer.
Concept: Best practices for keeping descriptions accurate as data models evolve.
In big projects, data changes often. Descriptions must be updated with model changes to avoid confusion. Use code reviews to check descriptions. Automate checks with dbt tests or external tools to find missing or outdated descriptions.
Result
Descriptions stay reliable and useful over time.
Knowing that documentation is a living part of the project prevents decay and builds trust in data.
6
ExpertSurprising effects of missing descriptions
🤔Before reading on: Do you think missing descriptions only cause minor inconvenience or can lead to serious errors? Commit to your answer.
Concept: How missing or wrong descriptions can cause costly mistakes in data analysis and business decisions.
Without descriptions, analysts may misinterpret columns, leading to wrong conclusions. For example, confusing 'order_date' with 'shipment_date' can affect sales reports. This can cause bad business decisions, lost revenue, or compliance issues.
Result
You realize documentation is critical for data quality and business success.
Understanding the real risks of poor documentation motivates prioritizing clear column descriptions.
Under the Hood
dbt stores column descriptions as metadata in YAML files linked to each model. When you run dbt commands, this metadata is parsed and combined with the compiled SQL models. The descriptions are then embedded into the documentation site and can be accessed by dbt's internal APIs. This separation of code and metadata allows descriptions to be maintained independently but stay connected to the data logic.
Why designed this way?
Storing descriptions in YAML keeps documentation human-readable and version-controlled alongside code. This design avoids mixing documentation with SQL code, making it easier to update and review. It also allows dbt to generate rich documentation websites automatically, improving collaboration and transparency.
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ schema.yml    │──────▶│ dbt compiler  │──────▶│ docs website  │
│ (descriptions)│       │ (parses YAML) │       │ (shows desc)  │
└───────────────┘       └───────────────┘       └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do you think column descriptions are only useful for new team members? Commit yes or no.
Common Belief:Column descriptions are just for beginners or new team members to understand data.
Tap to reveal reality
Reality:Descriptions help everyone, including experienced analysts and engineers, by reducing guesswork and errors.
Why it matters:Ignoring descriptions can cause even experts to misinterpret data, leading to costly mistakes.
Quick: Do you think descriptions should explain how data is calculated? Commit yes or no.
Common Belief:Descriptions must include detailed calculation logic for each column.
Tap to reveal reality
Reality:Descriptions should explain what the data means, not how it is calculated. Calculation details belong in tests or model code comments.
Why it matters:Mixing calculation logic into descriptions makes them long, confusing, and less useful for quick understanding.
Quick: Do you think missing descriptions have no impact if column names are clear? Commit yes or no.
Common Belief:If column names are clear, descriptions are unnecessary.
Tap to reveal reality
Reality:Even clear names can be ambiguous or misunderstood; descriptions provide essential context and prevent errors.
Why it matters:Relying only on names risks misinterpretation, especially in complex or evolving datasets.
Quick: Do you think descriptions are automatically updated when data changes? Commit yes or no.
Common Belief:Descriptions update themselves when the underlying data or models change.
Tap to reveal reality
Reality:Descriptions must be manually maintained; dbt does not auto-update them.
Why it matters:Outdated descriptions cause confusion and reduce trust in data documentation.
Expert Zone
1
Descriptions can include markdown formatting to add links or emphasize important points, enhancing readability.
2
Some teams use descriptions as a lightweight data dictionary, while others combine them with external catalog tools for richer metadata management.
3
In multi-environment setups, descriptions remain consistent across environments, helping maintain clarity despite data differences.
When NOT to use
Column descriptions are not a substitute for data lineage or detailed data quality tests. For complex transformations or business logic, use separate documentation or dbt tests. Also, avoid overloading descriptions with technical details better suited for code comments.
Production Patterns
In production, teams integrate column descriptions into automated documentation pipelines, enforce description presence via dbt tests, and review descriptions during code reviews. They also link descriptions to business glossaries and data catalogs to align technical and business language.
Connections
Data dictionaries
Column descriptions are the building blocks of data dictionaries.
Understanding column descriptions helps grasp how data dictionaries organize and explain datasets for users.
Software documentation
Both provide explanations to help users understand complex systems.
Knowing how clear documentation improves software usability helps appreciate the value of column descriptions in data.
User interface labels
Column descriptions serve a similar role as labels and tooltips in user interfaces.
Recognizing this connection shows how good labeling improves user experience across fields.
Common Pitfalls
#1Writing vague or redundant descriptions that repeat the column name.
Wrong approach:order_date: 'Date of order_date column'
Correct approach:order_date: 'The date when the customer placed the order'
Root cause:Misunderstanding that descriptions should explain meaning, not restate the name.
#2Leaving descriptions empty or missing for many columns.
Wrong approach:customer_id: ''
Correct approach:customer_id: 'Unique identifier for each customer'
Root cause:Underestimating the importance of documentation or lack of process to enforce it.
#3Including complex calculation logic inside descriptions.
Wrong approach:total_sales: 'Sum of all sales transactions where status is complete and date is in current month'
Correct approach:total_sales: 'Total sales amount for completed orders in the current month'
Root cause:Confusing description purpose with technical implementation details.
Key Takeaways
Column descriptions explain what each piece of data means, making data easier to understand and use.
In dbt, descriptions live in YAML files alongside models, keeping documentation close to code.
Clear, simple descriptions improve communication and reduce errors for all data users.
Descriptions power auto-generated documentation websites, increasing data transparency.
Maintaining accurate descriptions is essential to prevent costly misunderstandings in data analysis.

Practice

(1/5)
1. What is the main purpose of adding column descriptions in dbt?
easy
A. To change the data type of columns
B. To create new columns in the model
C. To explain what each column means for better understanding
D. To write SQL queries inside the YAML file

Solution

  1. Step 1: Understand the role of column descriptions

    Column descriptions provide explanations about what each column represents in the data model.
  2. Step 2: Differentiate from other YAML uses

    They do not change data types, create columns, or contain SQL code; they only describe columns.
  3. Final Answer:

    To explain what each column means for better understanding -> Option C
  4. Quick Check:

    Column descriptions = explain columns [OK]
Hint: Descriptions explain columns, not change data or structure [OK]
Common Mistakes:
  • Thinking descriptions change data types
  • Confusing descriptions with SQL code
  • Assuming descriptions create new columns
2. Which of the following is the correct syntax to add a column description in a dbt YAML file?
easy
A. description: customer_id: 'Unique ID for each customer'
B. columns: - name: customer_id description: 'Unique ID for each customer'
C. columns: customer_id: 'Unique ID for each customer'
D. columns: - customer_id: 'Unique ID for each customer'

Solution

  1. Step 1: Recall YAML structure for columns in dbt

    The correct format uses a list under columns: with each item having name and description keys.
  2. Step 2: Compare options to correct format

    columns: - name: customer_id description: 'Unique ID for each customer' matches the correct YAML syntax with dash, name, and description keys properly indented.
  3. Final Answer:

    columns: - name: customer_id description: 'Unique ID for each customer' -> Option B
  4. Quick Check:

    YAML columns list with name and description = columns: - name: customer_id description: 'Unique ID for each customer' [OK]
Hint: Use dash list with name and description keys in YAML [OK]
Common Mistakes:
  • Using key-value pairs without dash list
  • Putting description outside columns section
  • Incorrect indentation or missing name key
3. Given this YAML snippet in a dbt model:
columns:
  - name: order_id
    description: 'Unique order identifier'
  - name: order_date
    description: 'Date when order was placed'
What will dbt show for the order_date column in documentation?
medium
A. No description available
B. Unique order identifier
C. order_date
D. Date when order was placed

Solution

  1. Step 1: Locate the description for order_date

    The YAML shows order_date has description 'Date when order was placed'.
  2. Step 2: Understand dbt documentation behavior

    dbt uses the description text to show in docs, not the column name or other text.
  3. Final Answer:

    Date when order was placed -> Option D
  4. Quick Check:

    dbt docs show column description text [OK]
Hint: dbt docs show the description text, not column name [OK]
Common Mistakes:
  • Confusing column name with description
  • Assuming no description if present
  • Picking wrong description text
4. You wrote this YAML for column descriptions but dbt docs shows no descriptions:
columns:
  - name: user_id
    description 'User unique ID'
What is the error causing descriptions not to appear?
medium
A. Missing colon after description key
B. Wrong indentation of columns
C. Missing dash before name
D. Description text should be uppercase

Solution

  1. Step 1: Check YAML syntax for description key

    The line description 'User unique ID' is missing a colon after description.
  2. Step 2: Understand YAML parsing impact

    Without the colon, YAML is invalid and dbt cannot read the description, so docs show no description.
  3. Final Answer:

    Missing colon after description key -> Option A
  4. Quick Check:

    YAML keys need colon after them [OK]
Hint: Always put colon after YAML keys like description [OK]
Common Mistakes:
  • Forgetting colon after keys
  • Incorrect indentation
  • Assuming case sensitivity matters
5. You want to add descriptions for multiple columns in a dbt model YAML file. Which approach correctly documents two columns product_id and price with descriptions, ensuring dbt docs will display them properly?
hard
A. columns: - name: product_id description: 'ID of the product' - name: price description: 'Price in USD'
B. columns: product_id: 'ID of the product' price: 'Price in USD'
C. columns: - product_id: 'ID of the product' - price: 'Price in USD'
D. columns: name: product_id description: 'ID of the product' name: price description: 'Price in USD'

Solution

  1. Step 1: Recall correct YAML list format for multiple columns

    Each column must be an item in a list with name and description keys.
  2. Step 2: Evaluate each option's structure

    columns: - name: product_id description: 'ID of the product' - name: price description: 'Price in USD' correctly uses a list with two items, each having name and description properly indented.
  3. Final Answer:

    columns: - name: product_id description: 'ID of the product' - name: price description: 'Price in USD' -> Option A
  4. Quick Check:

    List of columns with name and description keys = columns: - name: product_id description: 'ID of the product' - name: price description: 'Price in USD' [OK]
Hint: Use dash list with name and description for each column [OK]
Common Mistakes:
  • Using key-value pairs without dash list
  • Repeating keys without list items
  • Incorrect indentation breaking YAML