In Apache Flume, what does an agent mainly do when collecting logs?
Think about the flow of data from where logs are generated to where they are stored.
A Flume agent acts as a container for sources, channels, and sinks. It collects data from sources, optionally processes or buffers it in channels, and sends it to sinks like HDFS or HBase.
Given this Flume configuration snippet, what is the expected output destination for the logs?
agent.sources = source1 agent.channels = channel1 agent.sinks = sink1 agent.sources.source1.type = exec agent.sources.source1.command = tail -F /var/log/syslog agent.channels.channel1.type = memory agent.channels.channel1.capacity = 1000 agent.sinks.sink1.type = hdfs agent.sinks.sink1.hdfs.path = hdfs://namenode/flume/logs/ agent.sinks.sink1.hdfs.filePrefix = syslog- agent.sources.source1.channels = channel1 agent.sinks.sink1.channel = channel1
Look at the sink type and path settings.
The sink is configured as HDFS with a specified path and file prefix, so logs collected from the source command are stored in HDFS with the given prefix.
Which option contains a syntax error that will prevent the Flume agent from starting?
agent.sources = source1 agent.channels = channel1 agent.sinks = sink1 agent.sources.source1.type = exec agent.sources.source1.command = tail -F /var/log/syslog agent.channels.channel1.type = memory agent.channels.channel1.capacity = 1000 agent.sinks.sink1.type = hdfs agent.sinks.sink1.hdfs.path = hdfs://namenode/flume/logs/ agent.sinks.sink1.hdfs.filePrefix = syslog- agent.sources.source1.channels = channel1 agent.sinks.sink1.channel = channel1
Check the property names for sinks connecting to channels.
The correct property to link a sink to a channel is 'agent.sinks.sink1.channel' (singular). Using 'channels' (plural) is invalid and causes a syntax error.
You want to improve Flume's performance to handle a large volume of logs with minimal delay. Which option is the best optimization?
Think about buffering and speed trade-offs.
Memory channels provide faster buffering than file channels because they avoid disk I/O, improving throughput. However, they risk data loss on failure. Increasing sinks or disabling batching can hurt performance. 'Avro' source type is unrelated to compression here.
Given this Flume agent configuration, logs are not appearing in HDFS. What is the most likely cause?
agent.sources = source1 agent.channels = channel1 agent.sinks = sink1 agent.sources.source1.type = exec agent.sources.source1.command = tail -F /var/log/syslog agent.channels.channel1.type = file agent.channels.channel1.checkpointDir = /tmp/flume/checkpoint agent.channels.channel1.dataDirs = /tmp/flume/data agent.sinks.sink1.type = hdfs agent.sinks.sink1.hdfs.path = hdfs://namenode/flume/logs/ agent.sinks.sink1.hdfs.filePrefix = syslog- agent.sources.source1.channels = channel1 agent.sinks.sink1.channel = channel1
Check file system permissions for channel directories.
File channels require writable directories for checkpoint and data storage. If permissions are missing, the channel cannot buffer events, causing delivery failure. The source command is valid, 'hdfs' sink type is correct, and file channel works with exec source.