Hadoopdata~30 mins

Why ingestion pipelines feed the data lake in Hadoop - See It in Action

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Why ingestion pipelines feed the data lake

📖 Scenario: You work at a company that collects data from many sources like sales, website clicks, and customer feedback. All this data needs to be stored in one big place called a data lake so analysts can use it later.

🎯 Goal: Build a simple data ingestion pipeline that collects data from different sources and stores it in a data lake represented by a dictionary. This will help you understand why ingestion pipelines feed the data lake.

📋 What You'll Learn

Create a dictionary called data_sources with three sources and their sample data

Create a list called data_lake to store all ingested data

Write a loop to add data from each source into the data_lake

Print the final data_lake to see all collected data

💡 Why This Matters

🌍 Real World

Companies collect data from many places like sales, websites, and customer feedback. They use ingestion pipelines to gather all this data into a data lake for easy access and analysis.

💼 Career

Understanding data ingestion pipelines is important for data engineers and data scientists who prepare data for analysis and machine learning.

Progress0 / 4 steps

Create data sources dictionary

Create a dictionary called data_sources with these exact entries: 'sales': [100, 200, 150], 'website_clicks': [300, 400], and 'customer_feedback': ['good', 'bad'].

Hadoop

# Create the data_sources dictionary with given entries
# Your code here

Need a hint?

Use curly braces {} to create a dictionary with keys and lists as values.

Create data lake list

Create an empty list called data_lake to store all ingested data.

Hadoop

data_sources = {
    'sales': [100, 200, 150],
    'website_clicks': [300, 400],
    'customer_feedback': ['good', 'bad']
}

# Create an empty list called data_lake
# Your code here

Need a hint?

Use square brackets [] to create an empty list.

Ingest data into the data lake

Use a for loop with variables source and data to iterate over data_sources.items(). Inside the loop, use data_lake.extend(data) to add each source's data to the data_lake.

Hadoop

data_sources = {
    'sales': [100, 200, 150],
    'website_clicks': [300, 400],
    'customer_feedback': ['good', 'bad']
}

data_lake = []

# Use a for loop to add all data from data_sources to data_lake
# Your code here

Need a hint?

Use for source, data in data_sources.items(): to loop through the dictionary. Use extend() to add list items.

Print the data lake

Write print(data_lake) to display all the data collected in the data lake.

Hadoop

data_sources = {
    'sales': [100, 200, 150],
    'website_clicks': [300, 400],
    'customer_feedback': ['good', 'bad']
}

data_lake = []

for source, data in data_sources.items():
    data_lake.extend(data)

# Print the data_lake list
# Your code here

Need a hint?

Use print(data_lake) to show the final combined data.