0
0
Hadoopdata~30 mins

Data lake design patterns in Hadoop - Mini Project: Build & Apply

Choose your learning style9 modes available
Data Lake Design Patterns with Hadoop
📖 Scenario: You work at a company that collects lots of data from different sources like sales, customer feedback, and website logs. You want to organize this data in a Hadoop data lake so it is easy to find and use later.
🎯 Goal: Build a simple data lake structure using Hadoop folders and files that follow common design patterns: raw data, cleaned data, and aggregated data.
📋 What You'll Learn
Create a dictionary called data_lake with keys for raw, cleaned, and aggregated data folders
Add a configuration variable called file_format set to parquet
Use a dictionary comprehension to create file paths for each data type using the file_format
Print the final data_lake_paths dictionary showing the full paths
💡 Why This Matters
🌍 Real World
Data lakes store large amounts of raw and processed data in Hadoop systems. Organizing data with clear folder and file naming helps teams find and use data efficiently.
💼 Career
Understanding data lake design patterns is important for data engineers and analysts working with big data platforms like Hadoop.
Progress0 / 4 steps
1
Create the initial data lake structure
Create a dictionary called data_lake with these exact keys and values: 'raw': '/data/raw', 'cleaned': '/data/cleaned', and 'aggregated': '/data/aggregated'.
Hadoop
Need a hint?

Use curly braces {} to create a dictionary with the specified keys and values.

2
Add a file format configuration
Create a variable called file_format and set it to the string 'parquet'.
Hadoop
Need a hint?

Use a simple assignment statement to create the variable file_format.

3
Create file paths using dictionary comprehension
Use a dictionary comprehension to create a new dictionary called data_lake_paths. For each key and path in data_lake.items(), create a new path by adding a slash and the file name data. plus the file_format extension. For example, the raw path should be /data/raw/data.parquet.
Hadoop
Need a hint?

Use {key: value for key, value in dictionary.items()} and f-strings to build the new paths.

4
Print the final data lake paths
Write a print statement to display the data_lake_paths dictionary.
Hadoop
Need a hint?

Use print(data_lake_paths) to show the dictionary.