beginner

What is a data lake?

A data lake is a large storage system that holds raw data in its original form. It can store all types of data, like text, images, or logs, without needing to organize it first.

Click to reveal answer

beginner

What is an ingestion pipeline in data processing?

An ingestion pipeline is a set of steps that collects data from different sources and moves it into a storage system like a data lake. It helps bring data in quickly and safely.

Click to reveal answer

intermediate

Why do ingestion pipelines feed data lakes instead of databases directly?

Data lakes can store all kinds of raw data without changing it. Ingestion pipelines feed data lakes so data is ready for many uses later, like analysis or machine learning, without losing details.

Click to reveal answer

intermediate

How does Hadoop support ingestion pipelines feeding data lakes?

Hadoop provides tools to store big data in a data lake and process it. It helps ingestion pipelines handle large amounts of data from many sources efficiently.

Click to reveal answer

beginner

What is one key benefit of feeding data lakes with ingestion pipelines?

It allows storing data quickly and cheaply in one place, so teams can explore and analyze data anytime without waiting for complex setup.

Click to reveal answer

What type of data does a data lake store?

ARaw data in any format

BOnly images and videos

COnly structured data

DOnly cleaned and processed data

What is the main role of an ingestion pipeline?

ATo visualize data

BTo analyze data

CTo delete old data

DTo collect and move data into storage

Why feed data lakes instead of databases directly?

AData lakes only store images

BDatabases are faster for raw data

CData lakes are cheaper and store raw data

DDatabases cannot store any data

Which Hadoop component helps store data in a data lake?

AMapReduce

BHDFS (Hadoop Distributed File System)

CYARN

DHive

What is a benefit of using ingestion pipelines with data lakes?

AData is stored quickly and in raw form

BData is only stored in databases

CData is deleted after ingestion

DData is stored slowly and carefully

Explain why ingestion pipelines are important for feeding data lakes.

Describe how Hadoop supports ingestion pipelines feeding data lakes.