Challenge - 5 Problems

🎖️

Data Lake Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Why does a data lake centralize data?

Which reason best explains why data lake architecture centralizes data?

AIt stores all types of data in one place, making it easier to access and analyze.

BIt separates data into many small silos to improve security.

CIt only stores structured data to reduce storage costs.

DIt duplicates data across multiple systems to increase speed.

Attempts:

2 left

🧠 Conceptual

intermediate

2:00remaining

What is a key benefit of centralizing data in a data lake?

What is a main benefit of having centralized data in a data lake architecture?

AIt requires data to be converted to a single format before storage.

BIt enables unified data governance and security policies.

CIt restricts data access to only one user at a time.

DIt allows for faster data duplication across systems.

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Data lake stores multiple data types

Given a Hadoop data lake storing different data types, which output shows the correct count of each data type stored?

Hadoop

data = [
  {'type': 'structured', 'count': 1500},
  {'type': 'unstructured', 'count': 3000},
  {'type': 'semi-structured', 'count': 1200}
]

counts = {item['type']: item['count'] for item in data}
print(counts)

A{'structured': 1200, 'unstructured': 3000, 'semi-structured': 1500}

B{'structured': 1500, 'unstructured': 1200, 'semi-structured': 3000}

C{'structured': 1500, 'unstructured': 3000, 'semi-structured': 1200}

D{'structured': 3000, 'unstructured': 1500, 'semi-structured': 1200}

Attempts:

2 left

🧠 Conceptual

advanced

2:00remaining

Why is schema-on-read important in data lakes?

Why does data lake architecture use schema-on-read instead of schema-on-write?

AIt allows storing raw data without upfront formatting, enabling flexible analysis later.

BIt forces data to be cleaned before storage, improving data quality immediately.

CIt requires data to be converted to a fixed schema before saving, reducing errors.

DIt duplicates data into multiple schemas for faster querying.

Attempts:

2 left

🚀 Application

expert

3:00remaining

Analyzing data lake centralization impact on Hadoop cluster

You have a Hadoop data lake storing all company data centrally. Which impact is most likely when centralizing data this way?

ALower storage costs because data is duplicated across multiple clusters.

BFaster query speed because data is pre-aggregated before storage.

CReduced data accessibility due to data being split across many nodes.

DImproved data accessibility but increased network traffic during large queries.

Attempts:

2 left