Jump into concepts and practice - no test required
or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Recall & Review
beginner
What is a raw data contract in the context of data sources?
A raw data contract is an agreement or set of rules that defines the expected structure, format, and quality of raw data coming from a source before it is processed or transformed.
Click to reveal answer
beginner
Why do data teams define raw data contracts for sources?
To ensure data consistency, reliability, and to catch errors early by setting clear expectations about the data's format and quality before it enters the transformation pipeline.
Click to reveal answer
intermediate
How do raw data contracts help in data transformation workflows?
They act as a checkpoint that validates incoming data, preventing bad or unexpected data from causing errors downstream in transformations or analyses.
Click to reveal answer
intermediate
What might happen if raw data contracts are not defined for sources?
Without raw data contracts, data inconsistencies or errors can go unnoticed, leading to incorrect analysis, broken pipelines, and loss of trust in data.
Click to reveal answer
beginner
Give an example of a rule that might be included in a raw data contract.
An example rule could be: "The column 'user_id' must always be a non-null integer," ensuring that every record has a valid user identifier.
Click to reveal answer
What is the main purpose of defining raw data contracts for sources?
ATo create visualizations directly from raw data
BTo speed up data transformation by skipping validation
CTo store data in a compressed format
DTo ensure data meets expected structure and quality before processing
✗ Incorrect
Raw data contracts set expectations for data format and quality to catch issues early.
Which of the following is NOT a benefit of raw data contracts?
AImproved data reliability
BEarly detection of data errors
CAutomatic data visualization
DClear communication of data expectations
✗ Incorrect
Raw data contracts do not create visualizations; they focus on data quality and structure.
What could happen if raw data contracts are missing?
AData will automatically be corrected
BData pipelines may break due to unexpected data
CData will be faster to process without checks
DData will be encrypted
✗ Incorrect
Without contracts, unexpected data can cause errors in pipelines.
A raw data contract might specify that a column must be:
ANon-null and of a specific data type
BEncrypted
CRandomly generated
DAlways null
✗ Incorrect
Contracts define expected data types and nullability to ensure data quality.
In dbt, why is defining raw data contracts important before transformations?
ATo avoid errors and maintain trust in transformed data
BTo skip testing transformed data
CTo reduce storage costs
DTo create dashboards automatically
✗ Incorrect
Contracts help catch issues early, ensuring transformations work on clean data.
Explain why defining raw data contracts for sources is important in a data pipeline.
Think about what happens if bad data enters your system.
You got /4 concepts.
Describe what kind of rules might be included in a raw data contract.
Consider how you would check if data is 'correct' before using it.
You got /4 concepts.
Practice
(1/5)
1. Why do we define raw data contracts in dbt sources?
easy
A. To set clear expectations for the raw data coming into the system
B. To speed up the data loading process
C. To automatically fix data errors
D. To create visual reports from raw data
Solution
Step 1: Understand the purpose of raw data contracts
Raw data contracts define what the incoming data should look like, such as expected columns and types.
Step 2: Identify the main benefit in dbt context
They help teams know what to expect and catch issues early, not speed up loading or fix errors automatically.
Final Answer:
To set clear expectations for the raw data coming into the system -> Option A
Quick Check:
Raw data contracts = clear data expectations [OK]
Hint: Raw data contracts = clear data rules for sources [OK]
Common Mistakes:
Thinking contracts speed up data loading
Assuming contracts fix data automatically
Confusing contracts with reporting tools
2. Which of the following is the correct way to define a source in a dbt YAML file for raw data contracts?
easy
A. source:
name: raw_data
table: users
columns:
- id
tests: [not_null, unique]
B. sources:
- name: raw_data
tables:
- name: users
columns:
- name: id
tests: [not_null, unique]
C. sources:
raw_data:
users:
columns:
- id
tests: [not_null, unique]
D. source:
- raw_data
- users
- columns:
- id
tests: [not_null, unique]
Solution
Step 1: Recall dbt source YAML structure
Sources are defined under sources: as a list with name and tables keys.
Step 2: Match correct indentation and keys
sources:
- name: raw_data
tables:
- name: users
columns:
- name: id
tests: [not_null, unique] correctly uses sources list, name, tables, and columns with tests.
Final Answer:
sources:
- name: raw_data
tables:
- name: users
columns:
- name: id
tests: [not_null, unique] -> Option B
Quick Check:
dbt source YAML = list with name, tables, columns [OK]
Hint: Sources use list with name and tables keys in YAML [OK]
When running dbt, you get a syntax error. What is the problem?
medium
A. The tests list should be inside square brackets [ ]
B. The 'columns' key should be 'column'
C. The 'tables' key should be a dictionary, not a list
D. The source name cannot be 'raw_data'
Solution
Step 1: Check YAML syntax for tests
Tests must be listed as a YAML list inside square brackets or as a list with dashes.
Step 2: Identify the error in tests format
Writing tests: not_null, unique is invalid YAML; it should be tests: [not_null, unique].
Final Answer:
The tests list should be inside square brackets [ ] -> Option A
Quick Check:
Tests need brackets or dashes in YAML [OK]
Hint: Tests must be a list with brackets or dashes [OK]
Common Mistakes:
Writing tests as comma-separated string without brackets
Using wrong key names like 'column' instead of 'columns'
Misunderstanding list vs dictionary in YAML
5. You want to ensure your raw data source in dbt matches a strict contract: every 'order_id' must be unique and not null, and 'order_date' must be present and in date format. How should you define this in your source YAML to catch issues early?
hard
A. Define the source with columns and add tests only for 'order_id' as unique, ignoring 'order_date'
B. Define the source with columns but no tests; rely on downstream models to catch errors
C. Define the source with columns 'order_id' and 'order_date' and add tests: 'order_id' with [not_null, unique], 'order_date' with [not_null, accepted_values] for dates
D. Define the source with columns and add tests for 'order_date' only, ignoring 'order_id'
Solution
Step 1: Identify required tests for 'order_id'
To ensure uniqueness and no nulls, use tests [not_null, unique] on 'order_id'.
Step 2: Define tests for 'order_date'
To ensure presence and valid dates, use [not_null] and a test like 'accepted_values' or a custom test for date format.
Step 3: Combine tests in source YAML
Include both columns with their respective tests to catch issues early at the source level.
Final Answer:
Define the source with columns 'order_id' and 'order_date' and add tests: 'order_id' with [not_null, unique], 'order_date' with [not_null, accepted_values] for dates -> Option C
Quick Check:
Raw data contracts include all critical tests [OK]
Hint: Test all critical columns with not_null and uniqueness [OK]
Common Mistakes:
Skipping tests on important columns
Relying on downstream models for raw data validation