Recall & Review
beginner
What is a raw data contract in the context of data sources?
A raw data contract is an agreement or set of rules that defines the expected structure, format, and quality of raw data coming from a source before it is processed or transformed.
Click to reveal answer
beginner
Why do data teams define raw data contracts for sources?
To ensure data consistency, reliability, and to catch errors early by setting clear expectations about the data's format and quality before it enters the transformation pipeline.
Click to reveal answer
intermediate
How do raw data contracts help in data transformation workflows?
They act as a checkpoint that validates incoming data, preventing bad or unexpected data from causing errors downstream in transformations or analyses.
Click to reveal answer
intermediate
What might happen if raw data contracts are not defined for sources?
Without raw data contracts, data inconsistencies or errors can go unnoticed, leading to incorrect analysis, broken pipelines, and loss of trust in data.
Click to reveal answer
beginner
Give an example of a rule that might be included in a raw data contract.
An example rule could be: "The column 'user_id' must always be a non-null integer," ensuring that every record has a valid user identifier.
Click to reveal answer
What is the main purpose of defining raw data contracts for sources?
✗ Incorrect
Raw data contracts set expectations for data format and quality to catch issues early.
Which of the following is NOT a benefit of raw data contracts?
✗ Incorrect
Raw data contracts do not create visualizations; they focus on data quality and structure.
What could happen if raw data contracts are missing?
✗ Incorrect
Without contracts, unexpected data can cause errors in pipelines.
A raw data contract might specify that a column must be:
✗ Incorrect
Contracts define expected data types and nullability to ensure data quality.
In dbt, why is defining raw data contracts important before transformations?
✗ Incorrect
Contracts help catch issues early, ensuring transformations work on clean data.
Explain why defining raw data contracts for sources is important in a data pipeline.
Think about what happens if bad data enters your system.
You got /4 concepts.
Describe what kind of rules might be included in a raw data contract.
Consider how you would check if data is 'correct' before using it.
You got /4 concepts.