0
0
Hadoopdata~20 mins

Flume for log collection in Hadoop - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Flume Log Collection Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
What is the primary role of a Flume agent in log collection?

In Apache Flume, what does an agent mainly do when collecting logs?

AIt converts logs into SQL queries for database insertion.
BIt collects data from sources, processes it, and sends it to sinks.
CIt only monitors the health of the Hadoop cluster.
DIt stores logs permanently in HDFS without processing.
Attempts:
2 left
💡 Hint

Think about the flow of data from where logs are generated to where they are stored.

query_result
intermediate
2:00remaining
What output does this Flume configuration produce?

Given this Flume configuration snippet, what is the expected output destination for the logs?

Hadoop
agent.sources = source1
agent.channels = channel1
agent.sinks = sink1

agent.sources.source1.type = exec
agent.sources.source1.command = tail -F /var/log/syslog

agent.channels.channel1.type = memory
agent.channels.channel1.capacity = 1000

agent.sinks.sink1.type = hdfs
agent.sinks.sink1.hdfs.path = hdfs://namenode/flume/logs/
agent.sinks.sink1.hdfs.filePrefix = syslog-

agent.sources.source1.channels = channel1
agent.sinks.sink1.channel = channel1
ALogs are sent to a Kafka topic named 'syslog'.
BLogs are collected from /var/log/syslog and stored locally on the agent machine.
CLogs are continuously collected from /var/log/syslog and stored in HDFS under /flume/logs/ with prefix 'syslog-'.
DLogs are discarded after being read from /var/log/syslog.
Attempts:
2 left
💡 Hint

Look at the sink type and path settings.

📝 Syntax
advanced
2:00remaining
Identify the syntax error in this Flume configuration snippet

Which option contains a syntax error that will prevent the Flume agent from starting?

Hadoop
agent.sources = source1
agent.channels = channel1
agent.sinks = sink1

agent.sources.source1.type = exec
agent.sources.source1.command = tail -F /var/log/syslog

agent.channels.channel1.type = memory
agent.channels.channel1.capacity = 1000

agent.sinks.sink1.type = hdfs
agent.sinks.sink1.hdfs.path = hdfs://namenode/flume/logs/
agent.sinks.sink1.hdfs.filePrefix = syslog-

agent.sources.source1.channels = channel1
agent.sinks.sink1.channel = channel1
Aagent.sinks.sink1.channel = channel1
Bagent.sources.source1.channels = channel1
Cagent.channels.channel1.capacity = 1000
Dagent.sinks.sink1.channels = channel1
Attempts:
2 left
💡 Hint

Check the property names for sinks connecting to channels.

optimization
advanced
2:00remaining
How to optimize Flume for high log throughput?

You want to improve Flume's performance to handle a large volume of logs with minimal delay. Which option is the best optimization?

AUse a memory channel instead of a file channel to reduce disk I/O latency.
BDisable batch processing to send each event immediately.
CSet the source type to 'avro' to compress logs before sending.
DIncrease the number of sinks but keep a single channel to avoid complexity.
Attempts:
2 left
💡 Hint

Think about buffering and speed trade-offs.

🔧 Debug
expert
3:00remaining
Why does this Flume agent fail to deliver logs to HDFS?

Given this Flume agent configuration, logs are not appearing in HDFS. What is the most likely cause?

Hadoop
agent.sources = source1
agent.channels = channel1
agent.sinks = sink1

agent.sources.source1.type = exec
agent.sources.source1.command = tail -F /var/log/syslog

agent.channels.channel1.type = file
agent.channels.channel1.checkpointDir = /tmp/flume/checkpoint
agent.channels.channel1.dataDirs = /tmp/flume/data

agent.sinks.sink1.type = hdfs
agent.sinks.sink1.hdfs.path = hdfs://namenode/flume/logs/
agent.sinks.sink1.hdfs.filePrefix = syslog-

agent.sources.source1.channels = channel1
agent.sinks.sink1.channel = channel1
AThe file channel directories (/tmp/flume/checkpoint and /tmp/flume/data) do not have proper write permissions.
BThe source command 'tail -F /var/log/syslog' is incorrect and does not produce output.
CThe sink type 'hdfs' is deprecated and should be replaced with 'hdfs_sink'.
DThe channel type 'file' is incompatible with the exec source.
Attempts:
2 left
💡 Hint

Check file system permissions for channel directories.