0
0
Hadoopdata~20 mins

Batch vs real-time ingestion in Hadoop - Practice Questions

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Batch vs Real-time Ingestion Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding Batch Ingestion Timing

Which statement best describes when batch ingestion processes data in a Hadoop ecosystem?

AData is processed continuously as it arrives in small chunks.
BData is processed immediately upon arrival without any delay.
CData is processed only when a user requests it in real-time.
DData is collected over a period and processed all at once at scheduled intervals.
Attempts:
2 left
💡 Hint

Think about how batch jobs usually run on a schedule, not instantly.

Predict Output
intermediate
2:00remaining
Output of a Real-time Data Stream Simulation

Given the following pseudo-code simulating real-time ingestion in Hadoop streaming, what will be the output after processing 3 data points?

Hadoop
stream = ['data1', 'data2', 'data3']
processed = []
for item in stream:
    processed.append(item.upper())
print(processed)
A['DATA1', 'DATA2', 'DATA3']
B['data1', 'data2', 'data3']
C['Data1', 'Data2', 'Data3']
DError: 'upper' method not found
Attempts:
2 left
💡 Hint

Check what the upper() method does to strings.

data_output
advanced
2:00remaining
Data Volume Comparison in Batch vs Real-time

Consider a Hadoop system where batch ingestion processes 10,000 records every hour, and real-time ingestion processes 100 records every minute. How many records does each process handle in 3 hours?

ABatch: 10,000 records; Real-time: 6,000 records
BBatch: 30,000 records; Real-time: 18,000 records
CBatch: 30,000 records; Real-time: 300 records
DBatch: 3,000 records; Real-time: 1,800 records
Attempts:
2 left
💡 Hint

Calculate total records by multiplying rate by time for both methods.

🔧 Debug
advanced
2:00remaining
Identifying Error in Real-time Data Processing Code

What error will the following Hadoop streaming code produce?

data_stream = ['a', 'b', 'c']
result = []
for d in data_stream:
    result.append(d / 2)
print(result)
ATypeError: unsupported operand type(s) for /: 'str' and 'int'
BSyntaxError: invalid syntax
CIndexError: list index out of range
DNo error; output will be ['a', 'b', 'c']
Attempts:
2 left
💡 Hint

Consider what happens when dividing a string by a number.

🚀 Application
expert
3:00remaining
Choosing Ingestion Method for Time-sensitive Data

You manage a Hadoop system that collects sensor data. The sensors send data every second, and you need to detect anomalies within 5 seconds of data arrival. Which ingestion method is best to meet this requirement?

ABatch ingestion, processing data every hour to reduce system load.
BBatch ingestion, processing data once a day for thorough analysis.
CReal-time ingestion, processing data as it arrives to detect anomalies quickly.
DNeither; manual data checks are better for anomaly detection.
Attempts:
2 left
💡 Hint

Think about how quickly you need to respond to data to detect anomalies.