What will be the output of the following Hadoop log configuration snippet when the log level is set to WARN?
log4j.logger.org.apache.hadoop=INFO, console log4j.logger.org.apache.hadoop.hdfs=WARN, console log4j.logger.org.apache.hadoop.mapreduce=ERROR, console
Assuming a log event of level INFO from org.apache.hadoop.hdfs and a log event of level WARN from org.apache.hadoop.mapreduce, which will be printed to the console?
log4j.logger.org.apache.hadoop=INFO, console log4j.logger.org.apache.hadoop.hdfs=WARN, console log4j.logger.org.apache.hadoop.mapreduce=ERROR, console
Remember that logs are printed only if their level is equal or higher than the configured level.
The org.apache.hadoop.hdfs logger is set to WARN, so INFO logs are ignored. The WARN log from org.apache.hadoop.mapreduce is below ERROR level, so it is ignored. Only WARN logs from org.apache.hadoop.hdfs or higher are printed.
Given a Hadoop log file with lines like:
2024-06-01 12:00:01,234 INFO Client: Connection established 2024-06-01 12:00:02,345 ERROR DataNode: Disk failure detected 2024-06-01 12:00:03,456 WARN NameNode: High memory usage
Which Python code snippet correctly counts the number of ERROR log entries?
log_lines = [
'2024-06-01 12:00:01,234 INFO Client: Connection established',
'2024-06-01 12:00:02,345 ERROR DataNode: Disk failure detected',
'2024-06-01 12:00:03,456 WARN NameNode: High memory usage'
]
# Count ERROR logsCheck how the log level appears in each line and how to detect it.
The log level appears after the timestamp and comma, so checking if 'ERROR' is in the line works. Option D fails because lines do not start with 'ERROR'. Option D fails because the third word is the time with comma, not the log level. Option D works but is case insensitive and less direct.
A Hadoop cluster admin notices that logs from org.apache.hadoop.mapreduce are missing in the log files, even though tasks are running. The log4j.properties file contains:
log4j.logger.org.apache.hadoop=INFO, console log4j.logger.org.apache.hadoop.mapreduce=OFF, console
What is the cause of missing logs for MapReduce?
Check what the OFF log level means in log4j.
Setting log level to OFF disables all logging for that logger. Hence, no MapReduce logs appear. The console appender is defined. INFO level is higher than OFF, so it does not affect MapReduce logger. Root logger missing would affect all logs.
You have extracted daily counts of ERROR logs from Hadoop over 7 days:
errors = [5, 7, 3, 8, 6, 10, 4]
Which Python code using matplotlib will correctly plot these error counts as a line chart with days on the x-axis labeled from 1 to 7?
import matplotlib.pyplot as plt errors = [5, 7, 3, 8, 6, 10, 4]
Check axis labels and plot type for a line chart with days on x-axis.
Option A correctly plots a line chart with days 1 to 7 on x-axis and error counts on y-axis. Option A plots a bar chart, not line. Option A swaps axes labels incorrectly. Option A uses scatter plot with swapped labels.
During a Hadoop job failure, the logs show repeated java.net.ConnectException: Connection refused errors from DataNode to NameNode. Which is the most likely root cause?
Think about what 'Connection refused' means in network communication.
'Connection refused' means the target service is not accepting connections, likely because NameNode is down or unreachable. Disk failure or bandwidth issues cause different errors. Memory settings affect job execution but not connection refusals.