Overview - Naming conventions at scale

What is it?

Naming conventions at scale are agreed rules for naming files, tables, columns, and models in large data projects. They help keep things clear and consistent when many people work together. Without clear names, it becomes hard to find, understand, or trust data. These conventions guide how to name things so everyone can easily read and use the data.

Why it matters

Without naming conventions, data projects become confusing and error-prone as they grow. Teams waste time guessing what data means or fixing mistakes caused by unclear names. Good naming conventions save time, reduce errors, and make collaboration smooth. They help data stay trustworthy and easy to maintain, even as projects get very large.

Where it fits

Before learning naming conventions, you should understand basic dbt concepts like models, sources, and how data flows. After mastering naming conventions, you can learn about advanced dbt features like testing, documentation, and deployment automation. Naming conventions are a foundation for clean, scalable data engineering.

Mental Model

Core Idea

Naming conventions are a shared language that keeps data organized and understandable as projects grow bigger and more complex.

Think of it like...

It's like organizing a huge library where every book has a clear label showing its genre, author, and topic so anyone can find the right book quickly without confusion.

┌───────────────────────────────┐
│ Naming Conventions at Scale    │
├───────────────┬───────────────┤
│ Scope         │ Examples      │
├───────────────┼───────────────┤
│ Tables        │ sales_orders  │
│ Columns       │ order_date    │
│ Models        │ stg_customers │
│ Files         │ 2024_01_load.sql │
└───────────────┴───────────────┘

Build-Up - 7 Steps

1

FoundationWhat are naming conventions

Concept: Naming conventions are simple rules for naming things in data projects.

In dbt, you create models, tables, and columns. Naming conventions tell you how to name these so everyone understands. For example, prefixing staging tables with 'stg_' or using lowercase with underscores.

Result

You get a clear, consistent way to name data objects that everyone can follow.

Understanding naming conventions early prevents confusion and makes teamwork easier.

2

FoundationCommon naming patterns in dbt

3

IntermediateScaling naming for large teams

4

IntermediateBalancing readability and brevity

5

IntermediateUsing namespaces and schemas

6

AdvancedAutomating naming with dbt macros

7

ExpertHandling legacy and evolving conventions

Under the Hood

Naming conventions work by creating a shared vocabulary that all dbt models, sources, and tests follow. Internally, dbt uses these names to build SQL queries, create tables, and manage dependencies. Consistent names allow dbt to link models correctly and generate documentation automatically.

Why designed this way?

Naming conventions were designed to solve confusion and errors in collaborative data projects. Early data teams faced chaos with random names, so conventions emerged to standardize communication. Alternatives like no rules or ad-hoc naming led to unmaintainable projects, so conventions became best practice.

┌───────────────┐
│ Naming Rules  │
├───────────────┤
│ Prefixes      │
│ Suffixes      │
│ Case Style    │
│ Separators    │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ dbt Model     │
│ Names         │
├───────────────┤
│ SQL Queries   │
│ Table Creation│
│ Documentation │
└───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Do you think naming conventions are only for big teams? Commit yes or no.

Common Belief:Naming conventions are only needed when many people work on a project.

Tap to reveal reality

Quick: Do you think longer names always mean better clarity? Commit yes or no.

Common Belief:Long, descriptive names are always better because they explain everything.

Tap to reveal reality

Quick: Do you think naming conventions can be ignored if you have good documentation? Commit yes or no.

Common Belief:Good documentation makes naming conventions unnecessary.

Tap to reveal reality

Quick: Do you think changing naming conventions mid-project is easy? Commit yes or no.

Common Belief:You can change naming conventions anytime without much trouble.

Tap to reveal reality

Expert Zone

1

Some teams use environment-specific suffixes (like '_dev' or '_prod') in names to manage parallel deployments without conflicts.

2

Naming conventions can encode metadata like data freshness or source system, enabling automated monitoring and alerts.

3

In multi-cloud or multi-database setups, naming conventions help unify naming across different platforms for easier cross-system queries.

When NOT to use

Strict naming conventions may be too rigid for very small or experimental projects where speed matters more than order. In such cases, lightweight or no conventions might be better until the project grows.

Production Patterns

Large companies use naming conventions combined with automated CI/CD pipelines that enforce naming rules via dbt macros and tests. They also integrate naming with data catalogs and governance tools to maintain data quality and compliance.

Connections

Software coding style guides

Naming conventions in dbt are similar to coding style guides in software development that enforce consistent variable and function names.

Understanding coding style guides helps appreciate why consistent naming reduces bugs and improves collaboration in data projects.

Library classification systems

Both naming conventions and library classification systems organize large collections for easy search and retrieval.

Seeing naming as a classification system highlights its role in making data discoverable and manageable at scale.

Linguistics - Controlled vocabularies

Naming conventions act like controlled vocabularies in linguistics, limiting word choices to reduce ambiguity.

Knowing about controlled vocabularies shows how limiting names improves clarity and communication in complex systems.

Common Pitfalls

#1Using inconsistent naming styles across models and tables.

Wrong approach:CREATE TABLE SalesOrders; -- elsewhere CREATE TABLE sales_orders;

Correct approach:CREATE TABLE sales_orders; -- everywhere use lowercase with underscores consistently

Root cause:Not agreeing on or enforcing a single naming style leads to confusion and errors.

#2Making names too long and complex to describe everything.

Wrong approach:CREATE TABLE fact_table_for_all_sales_transactions_in_2024;

Correct approach:CREATE TABLE fct_sales_2024;

Root cause:Trying to encode too much detail in names makes them hard to read and use.

#3Ignoring environment or team context in names causing conflicts.

Wrong approach:CREATE TABLE stg_orders; -- used by multiple teams/environments

Correct approach:CREATE TABLE sales_dev.stg_orders; -- schema separates environment

Root cause:Not using namespaces or schemas to separate contexts leads to name clashes.

Key Takeaways

Naming conventions create a shared language that keeps data projects organized and understandable as they grow.

Good conventions balance clarity and brevity, making names easy to read and type without losing meaning.

As teams and projects scale, naming conventions must evolve to include namespaces, environment tags, and automation.

Changing naming conventions in mature projects requires careful planning to avoid breaking dependencies.

Consistent naming reduces errors, saves time, and improves collaboration, making data trustworthy and maintainable.