0
0
Hadoopdata~20 mins

Log management and troubleshooting in Hadoop - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Hadoop Log Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
Understanding Hadoop Log Levels

What will be the output of the following Hadoop log configuration snippet when the log level is set to WARN?

log4j.logger.org.apache.hadoop=INFO, console
log4j.logger.org.apache.hadoop.hdfs=WARN, console
log4j.logger.org.apache.hadoop.mapreduce=ERROR, console

Assuming a log event of level INFO from org.apache.hadoop.hdfs and a log event of level WARN from org.apache.hadoop.mapreduce, which will be printed to the console?

Hadoop
log4j.logger.org.apache.hadoop=INFO, console
log4j.logger.org.apache.hadoop.hdfs=WARN, console
log4j.logger.org.apache.hadoop.mapreduce=ERROR, console
AOnly the INFO log from org.apache.hadoop.hdfs is printed
BOnly the WARN log from org.apache.hadoop.hdfs is printed
COnly the WARN log from org.apache.hadoop.mapreduce is printed
DBoth INFO from org.apache.hadoop.hdfs and WARN from org.apache.hadoop.mapreduce are printed
Attempts:
2 left
💡 Hint

Remember that logs are printed only if their level is equal or higher than the configured level.

data_output
intermediate
2:00remaining
Parsing Hadoop Logs for Error Counts

Given a Hadoop log file with lines like:

2024-06-01 12:00:01,234 INFO Client: Connection established
2024-06-01 12:00:02,345 ERROR DataNode: Disk failure detected
2024-06-01 12:00:03,456 WARN NameNode: High memory usage

Which Python code snippet correctly counts the number of ERROR log entries?

Hadoop
log_lines = [
    '2024-06-01 12:00:01,234 INFO Client: Connection established',
    '2024-06-01 12:00:02,345 ERROR DataNode: Disk failure detected',
    '2024-06-01 12:00:03,456 WARN NameNode: High memory usage'
]

# Count ERROR logs
Aerror_count = len([line for line in log_lines if 'error' in line.lower()])
Berror_count = len([line for line in log_lines if line.startswith('ERROR')])
Cerror_count = sum(1 for line in log_lines if line.split()[2] == 'ERROR')
Derror_count = sum(1 for line in log_lines if 'ERROR' in line)
Attempts:
2 left
💡 Hint

Check how the log level appears in each line and how to detect it.

🔧 Debug
advanced
2:00remaining
Troubleshooting Missing Hadoop Logs

A Hadoop cluster admin notices that logs from org.apache.hadoop.mapreduce are missing in the log files, even though tasks are running. The log4j.properties file contains:

log4j.logger.org.apache.hadoop=INFO, console
log4j.logger.org.apache.hadoop.mapreduce=OFF, console

What is the cause of missing logs for MapReduce?

AThe console appender is not defined, so logs are not saved
BThe log4j.properties file is missing the root logger configuration
CThe log level OFF disables all logging for org.apache.hadoop.mapreduce
DThe INFO level is too low to capture MapReduce logs
Attempts:
2 left
💡 Hint

Check what the OFF log level means in log4j.

🚀 Application
advanced
2:00remaining
Visualizing Hadoop Log Error Trends

You have extracted daily counts of ERROR logs from Hadoop over 7 days:

errors = [5, 7, 3, 8, 6, 10, 4]

Which Python code using matplotlib will correctly plot these error counts as a line chart with days on the x-axis labeled from 1 to 7?

Hadoop
import matplotlib.pyplot as plt
errors = [5, 7, 3, 8, 6, 10, 4]
A
plt.plot(range(1, 8), errors)
plt.xlabel('Day')
plt.ylabel('Error Count')
plt.title('Hadoop Error Trends')
plt.show()
B
plt.plot(errors)
plt.xlabel('Error Count')
plt.ylabel('Day')
plt.title('Hadoop Error Trends')
plt.show()
C
plt.bar(range(7), errors)
plt.xlabel('Day')
plt.ylabel('Error Count')
plt.title('Hadoop Error Trends')
plt.show()
D
plt.scatter(range(1, 8), errors)
plt.xlabel('Error Count')
plt.ylabel('Day')
plt.title('Hadoop Error Trends')
plt.show()
Attempts:
2 left
💡 Hint

Check axis labels and plot type for a line chart with days on x-axis.

🧠 Conceptual
expert
2:00remaining
Root Cause Analysis Using Hadoop Logs

During a Hadoop job failure, the logs show repeated java.net.ConnectException: Connection refused errors from DataNode to NameNode. Which is the most likely root cause?

ANameNode service is down or unreachable from DataNode
BDataNode has disk failure causing connection errors
CNetwork bandwidth is saturated causing slow connections
DMapReduce job configuration has incorrect memory settings
Attempts:
2 left
💡 Hint

Think about what 'Connection refused' means in network communication.