0
0
dbtdata~15 mins

Cross-team model sharing in dbt - Deep Dive

Choose your learning style9 modes available
Overview - Cross-team model sharing
What is it?
Cross-team model sharing in dbt means that different teams within an organization can use and build upon each other's data models. Instead of each team creating their own separate data transformations, they share reusable models to maintain consistency and save time. This approach helps teams collaborate better and ensures everyone works with the same trusted data. It is like sharing building blocks to create bigger, more complex data structures together.
Why it matters
Without cross-team model sharing, teams often duplicate work, create conflicting data definitions, and waste time fixing inconsistencies. This leads to confusion and mistrust in data across the company. Sharing models helps everyone speak the same data language, speeds up analytics, and improves decision-making. It turns data work from isolated silos into a collaborative effort that benefits the whole organization.
Where it fits
Before learning cross-team model sharing, you should understand basic dbt concepts like models, sources, and dependencies. After mastering sharing, you can explore advanced topics like dbt packages, version control integration, and automated testing. This topic sits in the middle of the dbt learning path, bridging individual model creation and large-scale data collaboration.
Mental Model
Core Idea
Cross-team model sharing is about building a shared library of trusted data transformations that multiple teams can use and extend to work together efficiently.
Think of it like...
Imagine a group of chefs in a kitchen sharing a common set of recipes. Instead of each chef inventing their own dish from scratch, they use and improve shared recipes to create consistent meals faster and with less waste.
┌───────────────────────────────┐
│       Shared Model Library     │
├─────────────┬───────────────┤
│ Team A      │ Team B        │
│ Uses Models │ Uses Models   │
│ from       │ from          │
│ Library    │ Library       │
├─────────────┴───────────────┤
│ Teams collaborate by sharing │
│ and reusing data models      │
└───────────────────────────────┘
Build-Up - 6 Steps
1
FoundationUnderstanding dbt Models Basics
🤔
Concept: Learn what dbt models are and how they transform raw data into usable tables.
In dbt, a model is a SQL file that defines a transformation on your raw data. When you run dbt, it runs these SQL queries and creates tables or views in your data warehouse. Models are the building blocks of your data pipeline.
Result
You can create simple tables in your warehouse by writing SQL in dbt models and running dbt commands.
Understanding models is essential because they are the units you will share across teams.
2
FoundationHow Dependencies Link Models
🤔
Concept: Discover how dbt models can depend on each other to build complex data pipelines.
Models can reference other models using the {{ ref('model_name') }} function. This creates a dependency graph where dbt knows the order to run models. For example, a sales_summary model can depend on a raw_sales model.
Result
dbt runs models in the correct order, ensuring data flows from raw to final tables.
Knowing dependencies helps you understand how shared models can be reused safely without breaking pipelines.
3
IntermediateIntroducing Cross-team Model Sharing
🤔Before reading on: do you think teams should copy models or reference shared models? Commit to your answer.
Concept: Learn how teams can share models by referencing each other's work instead of duplicating SQL code.
Instead of copying SQL code, teams can reference models created by other teams using dbt's ref function. This means if Team A builds a customer model, Team B can use it in their transformations by referring to it. This avoids duplication and keeps data consistent.
Result
Teams build on each other's models, reducing repeated work and improving data trust.
Understanding sharing as referencing rather than copying prevents data silos and duplication.
4
IntermediateUsing dbt Packages for Sharing
🤔Before reading on: do you think sharing models requires manual copying or can it be automated? Commit to your answer.
Concept: Explore how dbt packages enable teams to share models as reusable code libraries.
dbt packages are collections of models, macros, and tests that can be installed into other dbt projects. Teams publish their models as packages, and others add them as dependencies in their dbt_project.yml file. This automates sharing and version control.
Result
Teams can easily install, update, and use shared models across projects.
Knowing about packages reveals how sharing scales beyond simple references to organized, versioned libraries.
5
AdvancedManaging Versioning and Compatibility
🤔Before reading on: do you think updating shared models always breaks downstream projects? Commit to your answer.
Concept: Learn how to handle changes in shared models without breaking dependent teams' work.
When teams share models via packages, they use version numbers to control updates. Semantic versioning helps teams know if updates are safe (patch), add features (minor), or break compatibility (major). Teams can pin package versions to avoid unexpected breaks and plan upgrades carefully.
Result
Teams maintain stable pipelines while benefiting from improvements in shared models.
Understanding versioning prevents common pitfalls of breaking data pipelines in collaborative environments.
6
ExpertOptimizing Cross-team Collaboration Workflows
🤔Before reading on: do you think cross-team sharing only involves code, or also communication and governance? Commit to your answer.
Concept: Discover best practices for collaboration, governance, and testing in shared model environments.
Beyond code sharing, teams establish clear ownership, documentation, and testing standards for shared models. Automated CI/CD pipelines run tests on shared models before publishing. Communication channels coordinate changes and feedback. Governance policies define who can update shared models and how.
Result
Cross-team sharing becomes reliable, scalable, and trusted across the organization.
Knowing that sharing is as much about process as code helps avoid chaos and builds a culture of data collaboration.
Under the Hood
dbt compiles SQL models into executable queries and builds a dependency graph based on references. When sharing models, dbt packages bundle these models with metadata and version info. The package manager resolves dependencies and installs the correct versions into projects. During runs, dbt executes models in dependency order, ensuring shared models are built before dependent ones.
Why designed this way?
dbt was designed to treat data transformations as code, enabling software engineering best practices like modularity, version control, and testing. Sharing models as packages follows software library patterns, making data pipelines more maintainable and collaborative. Alternatives like copying SQL code were error-prone and hard to maintain, so dbt's package system was created to solve these issues.
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│ Team A Model │──────▶│ Package Repo  │──────▶│ Team B Project│
│ (customer)   │       │ (versioned)   │       │ (uses model)  │
└───────────────┘       └───────────────┘       └───────────────┘
        │                      ▲                       │
        │                      │                       │
        └──────────────────────┴───────────────────────┘
                 dbt package sharing and installation
Myth Busters - 3 Common Misconceptions
Quick: Do you think copying SQL code between teams is better than referencing shared models? Commit yes or no.
Common Belief:Copying SQL code between teams is fine and ensures independence.
Tap to reveal reality
Reality:Copying code leads to duplication, inconsistent data, and extra maintenance work.
Why it matters:Teams waste time fixing bugs in multiple places and data consumers get confused by conflicting definitions.
Quick: Do you think updating a shared model always breaks all dependent projects? Commit yes or no.
Common Belief:Any change to a shared model will break all teams using it.
Tap to reveal reality
Reality:With proper versioning and testing, updates can be safe and backward compatible.
Why it matters:Believing this causes teams to avoid improvements and creates stagnation in data pipelines.
Quick: Do you think cross-team sharing is only about code and not about communication? Commit yes or no.
Common Belief:Sharing models is just about sharing SQL files or packages.
Tap to reveal reality
Reality:Effective sharing requires communication, governance, and collaboration processes.
Why it matters:Ignoring this leads to confusion, duplicated effort, and broken pipelines.
Expert Zone
1
Shared models often require clear ownership and SLAs to ensure timely updates and support.
2
Semantic versioning in dbt packages is critical but often misunderstood, leading to accidental breaking changes.
3
Testing shared models with both unit and integration tests prevents subtle bugs that affect multiple teams.
When NOT to use
Cross-team model sharing is not ideal for highly experimental or rapidly changing models where stability is low. In such cases, teams should keep models isolated until they mature. Also, for very small teams or solo projects, the overhead of sharing may not be worth it.
Production Patterns
Organizations use dbt packages to create centralized data marts that multiple analytics teams consume. They implement CI/CD pipelines that run tests on shared models before publishing new versions. Governance teams manage package releases and documentation to ensure data quality and trust.
Connections
Software Package Management
Cross-team model sharing in dbt uses package management concepts similar to software libraries.
Understanding software package management helps grasp how dbt packages handle versioning, dependencies, and distribution of data models.
Modular Programming
Sharing models is like modular programming where code is split into reusable components.
Knowing modular programming principles clarifies why breaking data transformations into shared models improves maintainability and collaboration.
Collaborative Knowledge Sharing
Cross-team sharing parallels how teams share knowledge and best practices in organizations.
Recognizing this connection highlights that sharing models is not just technical but also cultural, requiring communication and governance.
Common Pitfalls
#1Duplicating SQL code instead of referencing shared models.
Wrong approach:SELECT * FROM raw_sales; -- copied and modified in multiple projects without ref()
Correct approach:SELECT * FROM {{ ref('raw_sales') }}; -- references shared model properly
Root cause:Misunderstanding that sharing means copying code rather than referencing reusable models.
#2Not pinning package versions, causing unexpected breaks.
Wrong approach:packages: - package: 'company/shared_models' version: '*'
Correct approach:packages: - package: 'company/shared_models' version: '1.2.3'
Root cause:Ignoring semantic versioning and trusting latest versions blindly.
#3Ignoring communication and governance when sharing models.
Wrong approach:Teams publish shared models without documentation or coordination.
Correct approach:Teams establish ownership, document models, and coordinate changes via meetings or tools.
Root cause:Assuming sharing is only a technical problem, not a social one.
Key Takeaways
Cross-team model sharing in dbt enables teams to reuse and build upon each other's data transformations, improving consistency and efficiency.
Sharing models by referencing and using dbt packages avoids duplication and helps maintain a single source of truth.
Proper versioning and testing of shared models prevent breaking changes and ensure stable data pipelines.
Effective sharing requires not only technical tools but also communication, governance, and collaboration processes.
Understanding software package management and modular programming concepts deepens the grasp of cross-team model sharing.