0
0
DbtConceptBeginner · 3 min read

What is schema.yml in dbt: Purpose and Usage Explained

schema.yml in dbt is a configuration file used to define tests, documentation, and metadata for your data models and sources. It helps ensure data quality and provides clear descriptions for your tables and columns within your dbt project.
⚙️

How It Works

Think of schema.yml as a blueprint or a guidebook for your data models in dbt. It tells dbt what tests to run on your data and how to describe your tables and columns so others can understand them easily. Just like a recipe card lists ingredients and steps, schema.yml lists your data model's structure and quality checks.

When you run dbt, it reads this file to apply tests like checking if a column has any missing values or if values are unique. It also uses the descriptions to generate documentation websites, making your data easier to trust and use.

💻

Example

This example shows a simple schema.yml file defining tests and descriptions for a model named orders.
yaml
version: 2
models:
  - name: orders
    description: "This table contains customer orders data."
    columns:
      - name: id
        description: "Unique identifier for each order."
        tests:
          - unique
          - not_null
      - name: order_date
        description: "Date when the order was placed."
        tests:
          - not_null
🎯

When to Use

Use schema.yml whenever you want to improve your data project's quality and clarity. It is essential for adding tests that catch errors early, like missing or duplicate data. It also helps document your data models so teammates and stakeholders understand what each table and column means.

In real life, if you manage sales data, you can use schema.yml to ensure order IDs are unique and dates are always present. This prevents mistakes and builds trust in your reports.

Key Points

  • Defines tests: Checks data quality like uniqueness and null values.
  • Documents models: Adds descriptions for tables and columns.
  • Supports dbt docs: Powers auto-generated data documentation websites.
  • YAML format: Easy to read and write configuration file.

Key Takeaways

schema.yml is used to define tests and documentation for dbt models and sources.
It helps catch data quality issues early by specifying tests like unique and not_null.
It improves collaboration by providing clear descriptions for tables and columns.
The file is written in YAML format and is part of your dbt project structure.