Which of the following best describes the three main layers of Lambda Architecture?
Think about how Lambda Architecture handles both historical and real-time data.
The batch layer stores the master dataset and precomputes batch views. The speed layer handles real-time data for low latency. The serving layer merges results for queries.
Given the batch view {'user1': 100, 'user2': 150} and speed view {'user2': 20, 'user3': 30}, what is the merged output in the serving layer summing values per user?
batch_view = {'user1': 100, 'user2': 150}
speed_view = {'user2': 20, 'user3': 30}
merged = {}
for user in set(batch_view) | set(speed_view):
merged[user] = batch_view.get(user, 0) + speed_view.get(user, 0)
print(merged)Remember to add values from both batch and speed views for each user.
The serving layer merges batch and speed views by summing values for each user key.
What error will the following Hadoop streaming code produce?
from pyspark.streaming import StreamingContext
ssc = StreamingContext(sc, 1)
lines = ssc.socketTextStream('localhost', 9999)
words = lines.flatMap(lambda line: line.split())
wordCounts = words.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)
wordCounts.pprint()
ssc.start()
ssc.awaitTermination()Check if all required objects are initialized before use.
The code uses 'sc' (SparkContext) without defining or initializing it, causing a NameError.
Given a text file with lines:
"apple banana apple"
"banana orange apple"
What is the output count of words after a Hadoop MapReduce word count job?
Input lines: "apple banana apple" "banana orange apple" Map step: emits (word, 1) for each word Reduce step: sums counts per word Expected output format: (word, total_count)
Count how many times each word appears in all lines combined.
"apple" appears 3 times, "banana" 2 times, "orange" 1 time.
You have a system that needs to process large historical data and also provide real-time analytics with low latency. Which scenario best justifies using Lambda Architecture?
Think about when both batch and streaming processing are needed together.
Lambda Architecture fits systems requiring both batch processing of large data and real-time streaming for immediate insights.