0
0
Hadoopdata~15 mins

Why data lake architecture centralizes data in Hadoop - See It in Action

Choose your learning style9 modes available
Why Data Lake Architecture Centralizes Data
📖 Scenario: Imagine a large company that collects data from many sources like sales, customer feedback, and website logs. They want to keep all this data in one place so everyone can use it easily.
🎯 Goal: You will create a simple example to show how data from different sources can be stored together in a data lake architecture using Python dictionaries. This will help you understand why data lakes centralize data.
📋 What You'll Learn
Create a dictionary called data_sources with three keys: 'sales', 'feedback', and 'logs'.
Each key should have a list of sample data strings as its value.
Create a variable called central_data_lake and set it to an empty list.
Use a for loop with variables source and records to iterate over data_sources.items().
Inside the loop, extend central_data_lake with the records.
Print the central_data_lake to show all data combined.
💡 Why This Matters
🌍 Real World
Companies collect data from many places like sales, customer feedback, and logs. A data lake stores all this data in one place so teams can analyze it easily.
💼 Career
Understanding data lake architecture helps you work with big data platforms like Hadoop and prepare data for analysis or machine learning.
Progress0 / 4 steps
1
Create the data sources dictionary
Create a dictionary called data_sources with these exact keys and values: 'sales' with ["sale1", "sale2"], 'feedback' with ["good", "bad"], and 'logs' with ["log1", "log2"].
Hadoop
Need a hint?

Use curly braces to create a dictionary. Each key should have a list of strings as its value.

2
Create the central data lake list
Create a variable called central_data_lake and set it to an empty list [].
Hadoop
Need a hint?

Use square brackets to create an empty list.

3
Combine all data into the central data lake
Use a for loop with variables source and records to iterate over data_sources.items(). Inside the loop, extend central_data_lake with the records.
Hadoop
Need a hint?

Use for source, records in data_sources.items(): and inside the loop use central_data_lake.extend(records).

4
Print the combined data
Write print(central_data_lake) to display all data combined in the central data lake.
Hadoop
Need a hint?

Use the print function to show the combined list.