Challenge - 5 Problems

🎖️

Batch vs Real-time Ingestion Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Understanding Batch Ingestion Timing

Which statement best describes when batch ingestion processes data in a Hadoop ecosystem?

AData is processed continuously as it arrives in small chunks.

BData is processed immediately upon arrival without any delay.

CData is processed only when a user requests it in real-time.

DData is collected over a period and processed all at once at scheduled intervals.

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of a Real-time Data Stream Simulation

Given the following pseudo-code simulating real-time ingestion in Hadoop streaming, what will be the output after processing 3 data points?

Hadoop

stream = ['data1', 'data2', 'data3']
processed = []
for item in stream:
    processed.append(item.upper())
print(processed)

A['DATA1', 'DATA2', 'DATA3']

B['data1', 'data2', 'data3']

C['Data1', 'Data2', 'Data3']

DError: 'upper' method not found

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Data Volume Comparison in Batch vs Real-time

Consider a Hadoop system where batch ingestion processes 10,000 records every hour, and real-time ingestion processes 100 records every minute. How many records does each process handle in 3 hours?

ABatch: 10,000 records; Real-time: 6,000 records

BBatch: 30,000 records; Real-time: 18,000 records

CBatch: 30,000 records; Real-time: 300 records

DBatch: 3,000 records; Real-time: 1,800 records

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identifying Error in Real-time Data Processing Code

What error will the following Hadoop streaming code produce?

data_stream = ['a', 'b', 'c']
result = []
for d in data_stream:
    result.append(d / 2)
print(result)

ATypeError: unsupported operand type(s) for /: 'str' and 'int'

BSyntaxError: invalid syntax

CIndexError: list index out of range

DNo error; output will be ['a', 'b', 'c']

Attempts:

2 left

🚀 Application

expert

3:00remaining

Choosing Ingestion Method for Time-sensitive Data

You manage a Hadoop system that collects sensor data. The sensors send data every second, and you need to detect anomalies within 5 seconds of data arrival. Which ingestion method is best to meet this requirement?

ABatch ingestion, processing data every hour to reduce system load.

BBatch ingestion, processing data once a day for thorough analysis.

CReal-time ingestion, processing data as it arrives to detect anomalies quickly.

DNeither; manual data checks are better for anomaly detection.

Attempts:

2 left