0
0
dbtdata~15 mins

Naming conventions at scale in dbt - Deep Dive

Choose your learning style9 modes available
Overview - Naming conventions at scale
What is it?
Naming conventions at scale are agreed rules for naming files, tables, columns, and models in large data projects. They help keep things clear and consistent when many people work together. Without clear names, it becomes hard to find, understand, or trust data. These conventions guide how to name things so everyone can easily read and use the data.
Why it matters
Without naming conventions, data projects become confusing and error-prone as they grow. Teams waste time guessing what data means or fixing mistakes caused by unclear names. Good naming conventions save time, reduce errors, and make collaboration smooth. They help data stay trustworthy and easy to maintain, even as projects get very large.
Where it fits
Before learning naming conventions, you should understand basic dbt concepts like models, sources, and how data flows. After mastering naming conventions, you can learn about advanced dbt features like testing, documentation, and deployment automation. Naming conventions are a foundation for clean, scalable data engineering.
Mental Model
Core Idea
Naming conventions are a shared language that keeps data organized and understandable as projects grow bigger and more complex.
Think of it like...
It's like organizing a huge library where every book has a clear label showing its genre, author, and topic so anyone can find the right book quickly without confusion.
┌───────────────────────────────┐
│ Naming Conventions at Scale    │
├───────────────┬───────────────┤
│ Scope         │ Examples      │
├───────────────┼───────────────┤
│ Tables        │ sales_orders  │
│ Columns       │ order_date    │
│ Models        │ stg_customers │
│ Files         │ 2024_01_load.sql │
└───────────────┴───────────────┘
Build-Up - 7 Steps
1
FoundationWhat are naming conventions
🤔
Concept: Naming conventions are simple rules for naming things in data projects.
In dbt, you create models, tables, and columns. Naming conventions tell you how to name these so everyone understands. For example, prefixing staging tables with 'stg_' or using lowercase with underscores.
Result
You get a clear, consistent way to name data objects that everyone can follow.
Understanding naming conventions early prevents confusion and makes teamwork easier.
2
FoundationCommon naming patterns in dbt
🤔
Concept: There are common patterns like prefixes and suffixes to show purpose or stage.
Examples: - 'stg_' prefix for staging tables (raw data) - 'int_' prefix for intermediate tables (cleaned data) - 'fct_' prefix for fact tables (metrics) - 'dim_' prefix for dimension tables (descriptions) These patterns help quickly identify table roles.
Result
You can tell a table's role just by its name.
Using common patterns helps everyone quickly understand data structure without extra explanation.
3
IntermediateScaling naming for large teams
🤔Before reading on: do you think one naming rule fits all teams or should it adapt as teams grow? Commit to your answer.
Concept: Naming conventions must adapt to team size and project complexity.
Small teams can use simple rules, but large teams need detailed conventions covering schema, environment, and ownership. For example, adding team initials or environment tags like 'dev' or 'prod' in names to avoid conflicts.
Result
Naming conventions become flexible and scalable, reducing confusion in big teams.
Knowing that naming must evolve with scale prevents chaos and naming collisions in big projects.
4
IntermediateBalancing readability and brevity
🤔Before reading on: is it better to have very short names or very descriptive long names? Commit to your answer.
Concept: Good naming balances clear meaning with manageable length.
Very short names save typing but can be unclear. Very long names explain everything but are hard to read and type. For example, 'fct_sales' is clear and short, while 'fact_table_for_all_sales_transactions' is too long. Choose names that are descriptive but concise.
Result
Names that are easy to read, remember, and type without losing meaning.
Balancing length and clarity improves daily work speed and reduces errors.
5
IntermediateUsing namespaces and schemas
🤔
Concept: Namespaces or schemas group related tables to avoid name clashes and improve organization.
In dbt, you can use database schemas to separate data by team, environment, or domain. For example, 'sales.stg_orders' and 'marketing.stg_orders' can coexist without confusion. This adds a layer beyond naming to organize data.
Result
Better organization and fewer naming conflicts across teams and environments.
Understanding namespaces helps scale naming beyond just table names, improving project structure.
6
AdvancedAutomating naming with dbt macros
🤔Before reading on: do you think naming should be manual or can it be automated? Commit to your answer.
Concept: dbt macros can automate naming to enforce conventions consistently.
You can write macros that generate table names based on model properties like type or source. For example, a macro that prefixes 'stg_' automatically for staging models. This reduces human error and keeps naming consistent.
Result
Consistent, error-free naming applied automatically across the project.
Automating naming saves time and prevents mistakes, especially in large projects.
7
ExpertHandling legacy and evolving conventions
🤔Before reading on: do you think naming conventions can change easily once set? Commit to your answer.
Concept: Changing naming conventions in mature projects requires careful planning to avoid breaking dependencies.
Legacy projects may have inconsistent names. Introducing new conventions means renaming tables and updating references carefully. Techniques include using aliases, deprecation periods, and communication with teams to migrate smoothly.
Result
A clean, updated naming system without disrupting existing workflows.
Knowing how to evolve naming conventions avoids costly errors and downtime in production.
Under the Hood
Naming conventions work by creating a shared vocabulary that all dbt models, sources, and tests follow. Internally, dbt uses these names to build SQL queries, create tables, and manage dependencies. Consistent names allow dbt to link models correctly and generate documentation automatically.
Why designed this way?
Naming conventions were designed to solve confusion and errors in collaborative data projects. Early data teams faced chaos with random names, so conventions emerged to standardize communication. Alternatives like no rules or ad-hoc naming led to unmaintainable projects, so conventions became best practice.
┌───────────────┐
│ Naming Rules  │
├───────────────┤
│ Prefixes      │
│ Suffixes      │
│ Case Style    │
│ Separators    │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ dbt Model     │
│ Names         │
├───────────────┤
│ SQL Queries   │
│ Table Creation│
│ Documentation │
└───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do you think naming conventions are only for big teams? Commit yes or no.
Common Belief:Naming conventions are only needed when many people work on a project.
Tap to reveal reality
Reality:Even solo data practitioners benefit from naming conventions to keep their work organized and understandable over time.
Why it matters:Ignoring naming conventions early leads to confusion and wasted time even for individuals, especially as projects grow.
Quick: Do you think longer names always mean better clarity? Commit yes or no.
Common Belief:Long, descriptive names are always better because they explain everything.
Tap to reveal reality
Reality:Overly long names become hard to read and type, reducing productivity and increasing errors.
Why it matters:Choosing overly long names slows down work and can cause mistakes in queries or documentation.
Quick: Do you think naming conventions can be ignored if you have good documentation? Commit yes or no.
Common Belief:Good documentation makes naming conventions unnecessary.
Tap to reveal reality
Reality:Documentation helps but cannot replace clear, consistent names that are visible everywhere in code and data.
Why it matters:Without naming conventions, documentation becomes harder to write and maintain, increasing confusion.
Quick: Do you think changing naming conventions mid-project is easy? Commit yes or no.
Common Belief:You can change naming conventions anytime without much trouble.
Tap to reveal reality
Reality:Changing conventions in mature projects is complex and risky, requiring careful coordination and testing.
Why it matters:Unplanned changes can break data pipelines and cause downtime, harming trust and productivity.
Expert Zone
1
Some teams use environment-specific suffixes (like '_dev' or '_prod') in names to manage parallel deployments without conflicts.
2
Naming conventions can encode metadata like data freshness or source system, enabling automated monitoring and alerts.
3
In multi-cloud or multi-database setups, naming conventions help unify naming across different platforms for easier cross-system queries.
When NOT to use
Strict naming conventions may be too rigid for very small or experimental projects where speed matters more than order. In such cases, lightweight or no conventions might be better until the project grows.
Production Patterns
Large companies use naming conventions combined with automated CI/CD pipelines that enforce naming rules via dbt macros and tests. They also integrate naming with data catalogs and governance tools to maintain data quality and compliance.
Connections
Software coding style guides
Naming conventions in dbt are similar to coding style guides in software development that enforce consistent variable and function names.
Understanding coding style guides helps appreciate why consistent naming reduces bugs and improves collaboration in data projects.
Library classification systems
Both naming conventions and library classification systems organize large collections for easy search and retrieval.
Seeing naming as a classification system highlights its role in making data discoverable and manageable at scale.
Linguistics - Controlled vocabularies
Naming conventions act like controlled vocabularies in linguistics, limiting word choices to reduce ambiguity.
Knowing about controlled vocabularies shows how limiting names improves clarity and communication in complex systems.
Common Pitfalls
#1Using inconsistent naming styles across models and tables.
Wrong approach:CREATE TABLE SalesOrders; -- elsewhere CREATE TABLE sales_orders;
Correct approach:CREATE TABLE sales_orders; -- everywhere use lowercase with underscores consistently
Root cause:Not agreeing on or enforcing a single naming style leads to confusion and errors.
#2Making names too long and complex to describe everything.
Wrong approach:CREATE TABLE fact_table_for_all_sales_transactions_in_2024;
Correct approach:CREATE TABLE fct_sales_2024;
Root cause:Trying to encode too much detail in names makes them hard to read and use.
#3Ignoring environment or team context in names causing conflicts.
Wrong approach:CREATE TABLE stg_orders; -- used by multiple teams/environments
Correct approach:CREATE TABLE sales_dev.stg_orders; -- schema separates environment
Root cause:Not using namespaces or schemas to separate contexts leads to name clashes.
Key Takeaways
Naming conventions create a shared language that keeps data projects organized and understandable as they grow.
Good conventions balance clarity and brevity, making names easy to read and type without losing meaning.
As teams and projects scale, naming conventions must evolve to include namespaces, environment tags, and automation.
Changing naming conventions in mature projects requires careful planning to avoid breaking dependencies.
Consistent naming reduces errors, saves time, and improves collaboration, making data trustworthy and maintainable.