0
0
Hadoopdata~10 mins

Lambda architecture (batch + streaming) in Hadoop - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to read data from HDFS using Hadoop streaming.

Hadoop
hadoop fs -cat [1]/input.txt
Drag options to blanks, or click blank then click option'
A/tmp
B/data
C/user/hadoop
D/input
Attempts:
3 left
💡 Hint
Common Mistakes
Using a local file system path instead of HDFS path.
Choosing a directory that does not contain the input file.
2fill in blank
medium

Complete the code to define a mapper function in Hadoop streaming that converts input text to lowercase.

Hadoop
"#!/bin/bash
while read line; do
  echo [1]
done"
Drag options to blanks, or click blank then click option'
Aecho $line
Becho $line | tr '[:lower:]' '[:upper:]'
Cecho $line | tr '[:upper:]' '[:lower:]'
Decho $line | rev
Attempts:
3 left
💡 Hint
Common Mistakes
Reversing the translation and converting lowercase to uppercase.
Not using the tr command.
3fill in blank
hard

Fix the error in the reducer code that sums counts from the mapper output.

Hadoop
"#!/usr/bin/env python3
import sys

current_word = None
current_count = 0

for line in sys.stdin:
    word, count = line.strip().split('\t')
    count = int(count)
    if word == [1]:
        current_count += count
    else:
        if current_word:
            print(f"{current_word}\t{current_count}")
        current_word = word
        current_count = count

if current_word == [1]:
    print(f"{current_word}\t{current_count}")
"
Drag options to blanks, or click blank then click option'
Acurrent_word
Bword
Ccount
Dline
Attempts:
3 left
💡 Hint
Common Mistakes
Comparing with the new word variable instead of the current word.
Using the count variable in the comparison.
4fill in blank
hard

Fill both blanks to create a batch layer job that reads from HDFS and writes output to a directory.

Hadoop
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
  -input [1] \
  -output [2] \
  -mapper mapper.sh \
  -reducer reducer.py
Drag options to blanks, or click blank then click option'
A/data/batch_input
B/user/hadoop/batch_output
C/tmp/input
D/output
Attempts:
3 left
💡 Hint
Common Mistakes
Using local file system paths instead of HDFS paths.
Writing output to an input directory.
5fill in blank
hard

Fill all three blanks to create a streaming layer command that reads from Kafka and writes to HDFS.

Hadoop
kafka-console-consumer.sh --bootstrap-server [1] --topic [2] --from-beginning | \
  hadoop fs -put - [3]/streaming_output/data.txt
Drag options to blanks, or click blank then click option'
Alocalhost:9092
Buser_events
C/user/hadoop
Dlocalhost:2181
Attempts:
3 left
💡 Hint
Common Mistakes
Confusing Kafka ports 9092 and 2181.
Using local file system paths instead of HDFS paths.