0
0
Hadoopdata~20 mins

Lambda architecture (batch + streaming) in Hadoop - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Lambda Architecture Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Core components of Lambda Architecture

Which of the following best describes the three main layers of Lambda Architecture?

ABatch layer for storing master data, speed layer for real-time views, and serving layer for query responses
BData ingestion layer, machine learning layer, and visualization layer
CStorage layer, processing layer, and presentation layer
DData cleaning layer, batch processing layer, and streaming layer
Attempts:
2 left
💡 Hint

Think about how Lambda Architecture handles both historical and real-time data.

Predict Output
intermediate
2:00remaining
Output of batch and speed layer data merge

Given the batch view {'user1': 100, 'user2': 150} and speed view {'user2': 20, 'user3': 30}, what is the merged output in the serving layer summing values per user?

Hadoop
batch_view = {'user1': 100, 'user2': 150}
speed_view = {'user2': 20, 'user3': 30}
merged = {}
for user in set(batch_view) | set(speed_view):
    merged[user] = batch_view.get(user, 0) + speed_view.get(user, 0)
print(merged)
A{'user1': 100, 'user2': 170, 'user3': 30}
B{'user1': 100, 'user2': 150, 'user3': 30}
C{'user1': 100, 'user2': 20, 'user3': 30}
D{'user1': 0, 'user2': 170, 'user3': 30}
Attempts:
2 left
💡 Hint

Remember to add values from both batch and speed views for each user.

🔧 Debug
advanced
2:00remaining
Identify the error in streaming data processing code

What error will the following Hadoop streaming code produce?

from pyspark.streaming import StreamingContext
ssc = StreamingContext(sc, 1)
lines = ssc.socketTextStream('localhost', 9999)
words = lines.flatMap(lambda line: line.split())
wordCounts = words.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)
wordCounts.pprint()
ssc.start()
ssc.awaitTermination()
ATypeError because flatMap expects a list but gets a string
BNameError because 'sc' (SparkContext) is not defined
CRuntimeError because socketTextStream port is closed
DSyntaxError due to missing colon in lambda
Attempts:
2 left
💡 Hint

Check if all required objects are initialized before use.

data_output
advanced
2:00remaining
Result of batch processing with Hadoop MapReduce

Given a text file with lines:
"apple banana apple"
"banana orange apple"
What is the output count of words after a Hadoop MapReduce word count job?

Hadoop
Input lines:
"apple banana apple"
"banana orange apple"

Map step: emits (word, 1) for each word
Reduce step: sums counts per word

Expected output format: (word, total_count)
A(apple, 3)\n(banana, 3)\n(orange, 1)
B(apple, 2)\n(banana, 2)\n(orange, 1)
C(apple, 3)\n(banana, 2)\n(orange, 1)
D(apple, 3)\n(banana, 2)\n(orange, 2)
Attempts:
2 left
💡 Hint

Count how many times each word appears in all lines combined.

🚀 Application
expert
2:00remaining
Choosing Lambda Architecture for a use case

You have a system that needs to process large historical data and also provide real-time analytics with low latency. Which scenario best justifies using Lambda Architecture?

AA simple calculator app performing local computations without data storage
BA static website serving fixed content without updates
CA batch-only system processing monthly sales reports without real-time needs
DA social media platform analyzing user posts historically and showing live trending topics instantly
Attempts:
2 left
💡 Hint

Think about when both batch and streaming processing are needed together.