Overview - Data model best practices

What is it?

A data model is a structured way to organize and connect data tables so that you can analyze information easily. Data model best practices are guidelines to build these models efficiently, making reports faster and easier to understand. They help ensure your data is accurate, consistent, and ready for insightful visualizations. Good data models reduce errors and improve performance in tools like Power BI.

Why it matters

Without good data models, reports can be slow, confusing, or wrong. Imagine trying to find answers in a messy pile of papers versus a well-organized filing cabinet. Poor models make it hard to trust your insights and waste time fixing problems. Best practices help you build models that save time, avoid mistakes, and make your data work for you.

Where it fits

Before learning data model best practices, you should understand basic data concepts like tables, columns, and relationships. After mastering these practices, you can learn advanced topics like DAX calculations, performance tuning, and complex data transformations.

Mental Model

Core Idea

A good data model is like a clean, well-organized map that connects all your data points clearly and efficiently for fast and accurate insights.

Think of it like...

Think of a data model like a city's road map: well-planned roads (relationships) connect neighborhoods (tables) so you can travel (analyze) quickly without getting lost or stuck in traffic.

┌─────────────┐     ┌─────────────┐
│  Customers  │────▶│  Orders     │
└─────────────┘     └─────────────┘
       │                   │
       ▼                   ▼
┌─────────────┐     ┌─────────────┐
│  Geography  │     │  Products   │
└─────────────┘     └─────────────┘

Build-Up - 7 Steps

1

FoundationUnderstand tables and columns

Concept: Learn what tables and columns are and how they store data.

In Power BI, data is stored in tables, similar to spreadsheets. Each table has columns that hold specific types of information, like names, dates, or numbers. For example, a 'Customers' table might have columns for Customer ID, Name, and City.

Result

You can identify and organize your raw data into tables with clear columns.

Understanding tables and columns is the base for building any data model because all data lives here.

2

FoundationLearn relationships between tables

3

IntermediateUse star schema design

4

IntermediateAvoid calculated columns when possible

5

IntermediateSet correct data types and formats

6

AdvancedOptimize model size and performance

7

ExpertManage relationships and filter directions carefully

Under the Hood

Power BI stores data in a compressed, columnar format called VertiPaq. It builds relationships by linking keys between tables, enabling fast filtering and aggregation. When you create visuals, Power BI uses these relationships and the VertiPaq engine to quickly retrieve and calculate data based on user interactions.

Why designed this way?

The star schema and columnar storage were chosen to optimize speed and simplicity. Columnar compression reduces memory use, and clear relationships avoid complex joins. This design balances flexibility with performance, unlike older row-based or flat table models.

┌───────────────┐
│   VertiPaq   │
│  Compression │
└──────┬────────┘
       │
┌──────▼────────┐
│  Column Store │
└──────┬────────┘
       │
┌──────▼────────┐
│ Relationships │
│  (Keys & FKs) │
└──────┬────────┘
       │
┌──────▼────────┐
│  Query Engine │
│  (Filters &   │
│  Aggregations)│
└───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Do you think adding more calculated columns always improves report speed? Commit to yes or no.

Common Belief:More calculated columns make reports faster because calculations are pre-made.

Tap to reveal reality

Quick: Do you think relationships set to both directions always fix filtering issues? Commit to yes or no.

Common Belief:Setting relationships to both directions is always better for filtering data correctly.

Tap to reveal reality

Quick: Do you think a flat table with all data is better than a star schema? Commit to yes or no.

Common Belief:Putting all data in one big table is simpler and better for analysis.

Tap to reveal reality

Quick: Do you think data types don’t affect performance or accuracy? Commit to yes or no.

Common Belief:Data types are just labels and don’t impact how data works or performs.

Tap to reveal reality

Expert Zone

1

Using inactive relationships with USERELATIONSHIP in DAX allows flexible model behavior without cluttering the model with multiple active links.

2

Reducing cardinality in columns (like grouping many unique values) drastically improves compression and query speed, often overlooked by beginners.

3

Composite models let you combine DirectQuery and Import modes, balancing real-time data needs with performance, but require careful design to avoid slowdowns.

When NOT to use

Avoid star schema when working with highly normalized source systems requiring many-to-many relationships; instead, use snowflake schema or advanced DAX techniques. Also, avoid complex bidirectional filters in large models; use explicit measures or aggregation tables instead.

Production Patterns

Professionals often build a clean star schema with fact and dimension tables, use measures over calculated columns, optimize model size by removing unused columns, and carefully manage relationship directions. Incremental refresh and aggregations are common for large datasets to keep reports responsive.

Connections

Database Normalization

Data model best practices build on normalization principles to organize data efficiently.

Understanding normalization helps you design dimension tables that avoid redundancy and improve data quality.

Software Engineering Modular Design

Both concepts emphasize breaking complex systems into clear, manageable parts.

Seeing data models as modular components helps maintain and scale BI solutions like well-designed software.

Urban Planning

Like city planners design roads and zones for smooth traffic, data modelers design tables and relationships for smooth data flow.

Recognizing this connection highlights the importance of clear structure and flow in complex systems.

Common Pitfalls

#1Creating many-to-many relationships without understanding their impact.

Wrong approach:Setting multiple many-to-many relationships active between tables without filters.

Correct approach:Use bridge tables or star schema design to avoid many-to-many relationships or set them inactive and use DAX to control filtering.

Root cause:Misunderstanding how many-to-many relationships affect filter context and performance.

#2Using text columns as keys in relationships.

Wrong approach:Linking tables using text columns like customer names instead of numeric IDs.

Correct approach:Use unique numeric keys (IDs) for relationships to ensure accuracy and performance.

Root cause:Not knowing that text keys are slower and prone to errors due to duplicates or typos.

#3Leaving unused columns in the model.

Wrong approach:Importing entire source tables with all columns, even those not used in reports.

Correct approach:Remove unused columns during data load to reduce model size and improve speed.

Root cause:Assuming more data is always better without considering performance impact.

Key Takeaways

A well-designed data model organizes data into clear tables connected by relationships, enabling fast and accurate analysis.

Using a star schema with fact and dimension tables simplifies your model and improves performance.

Measures are usually better than calculated columns for calculations because they keep the model smaller and faster.

Setting correct data types and managing relationship directions carefully prevents errors and slowdowns.

Optimizing model size by removing unused columns and reducing cardinality keeps reports responsive and scalable.