0
0
Hadoopdata~30 mins

Why ingestion pipelines feed the data lake in Hadoop - See It in Action

Choose your learning style9 modes available
Why ingestion pipelines feed the data lake
📖 Scenario: You work at a company that collects data from many sources like sales, website clicks, and customer feedback. All this data needs to be stored in one big place called a data lake so analysts can use it later.
🎯 Goal: Build a simple data ingestion pipeline that collects data from different sources and stores it in a data lake represented by a dictionary. This will help you understand why ingestion pipelines feed the data lake.
📋 What You'll Learn
Create a dictionary called data_sources with three sources and their sample data
Create a list called data_lake to store all ingested data
Write a loop to add data from each source into the data_lake
Print the final data_lake to see all collected data
💡 Why This Matters
🌍 Real World
Companies collect data from many places like sales, websites, and customer feedback. They use ingestion pipelines to gather all this data into a data lake for easy access and analysis.
💼 Career
Understanding data ingestion pipelines is important for data engineers and data scientists who prepare data for analysis and machine learning.
Progress0 / 4 steps
1
Create data sources dictionary
Create a dictionary called data_sources with these exact entries: 'sales': [100, 200, 150], 'website_clicks': [300, 400], and 'customer_feedback': ['good', 'bad'].
Hadoop
Need a hint?

Use curly braces {} to create a dictionary with keys and lists as values.

2
Create data lake list
Create an empty list called data_lake to store all ingested data.
Hadoop
Need a hint?

Use square brackets [] to create an empty list.

3
Ingest data into the data lake
Use a for loop with variables source and data to iterate over data_sources.items(). Inside the loop, use data_lake.extend(data) to add each source's data to the data_lake.
Hadoop
Need a hint?

Use for source, data in data_sources.items(): to loop through the dictionary. Use extend() to add list items.

4
Print the data lake
Write print(data_lake) to display all the data collected in the data lake.
Hadoop
Need a hint?

Use print(data_lake) to show the final combined data.