Complete the code to read data from HDFS using Hadoop streaming.
hadoop fs -cat [1]/input.txtThe correct HDFS path to the input data is /data. This command reads the file input.txt from that directory.
Complete the code to define a mapper function in Hadoop streaming that converts input text to lowercase.
"#!/bin/bash while read line; do echo [1] done"
tr command.The mapper converts each input line to lowercase using tr '[:upper:]' '[:lower:]'.
Fix the error in the reducer code that sums counts from the mapper output.
"#!/usr/bin/env python3 import sys current_word = None current_count = 0 for line in sys.stdin: word, count = line.strip().split('\t') count = int(count) if word == [1]: current_count += count else: if current_word: print(f"{current_word}\t{current_count}") current_word = word current_count = count if current_word == [1]: print(f"{current_word}\t{current_count}") "
The reducer compares the current word with current_word to accumulate counts correctly.
Fill both blanks to create a batch layer job that reads from HDFS and writes output to a directory.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \ -input [1] \ -output [2] \ -mapper mapper.sh \ -reducer reducer.py
The batch job reads input from /data/batch_input and writes output to /user/hadoop/batch_output.
Fill all three blanks to create a streaming layer command that reads from Kafka and writes to HDFS.
kafka-console-consumer.sh --bootstrap-server [1] --topic [2] --from-beginning | \ hadoop fs -put - [3]/streaming_output/data.txt
The Kafka server is localhost:9092, the topic is user_events, and the HDFS output directory is /user/hadoop.