0
0
Hadoopdata~5 mins

Flume for log collection in Hadoop

Choose your learning style9 modes available
Introduction
Flume helps collect and move log data from many sources to a storage system easily and reliably.
You want to gather logs from multiple servers into one place for analysis.
You need to stream real-time log data to Hadoop for processing.
You want to collect logs from web servers and store them in HDFS automatically.
You want to monitor application logs continuously without manual copying.
You need a simple way to move large amounts of log data efficiently.
Syntax
Hadoop
agent.sources = source1
agent.sinks = sink1
agent.channels = channel1

agent.sources.source1.type = exec
agent.sources.source1.command = tail -F /var/log/syslog

agent.sinks.sink1.type = hdfs
agent.sinks.sink1.hdfs.path = hdfs://namenode/flume/logs/

agent.channels.channel1.type = memory

agent.sources.source1.channels = channel1
agent.sinks.sink1.channel = channel1
This is a simple Flume configuration with one source, one sink, and one channel.
Sources collect data, channels buffer it, and sinks send it to storage.
Examples
Collects web server access logs and stores them in HDFS.
Hadoop
agent.sources = source1
agent.sinks = sink1
agent.channels = channel1

agent.sources.source1.type = exec
agent.sources.source1.command = tail -F /var/log/access.log

agent.sinks.sink1.type = hdfs
agent.sinks.sink1.hdfs.path = hdfs://namenode/flume/access_logs/

agent.channels.channel1.type = memory

agent.sources.source1.channels = channel1
agent.sinks.sink1.channel = channel1
Receives syslog data on port 5140 and logs it locally.
Hadoop
agent.sources = source2
agent.sinks = sink2
agent.channels = channel2

agent.sources.source2.type = syslogudp
agent.sources.source2.port = 5140

agent.sinks.sink2.type = logger

agent.channels.channel2.type = memory

agent.sources.source2.channels = channel2
agent.sinks.sink2.channel = channel2
Sample Program
This configuration collects system logs continuously and stores them in HDFS under /flume/syslog/.
Hadoop
# Flume config file example
agent.sources = source1
agent.sinks = sink1
agent.channels = channel1

agent.sources.source1.type = exec
agent.sources.source1.command = tail -F /var/log/syslog

agent.sinks.sink1.type = hdfs
agent.sinks.sink1.hdfs.path = hdfs://namenode/flume/syslog/

agent.channels.channel1.type = memory

agent.sources.source1.channels = channel1
agent.sinks.sink1.channel = channel1
OutputSuccess
Important Notes
Flume agents run continuously to collect and move data.
Channels act like buffers to handle data flow smoothly.
You can have multiple sources and sinks for complex setups.
Summary
Flume collects logs from many sources and sends them to storage like HDFS.
It uses sources, channels, and sinks to move data reliably.
You configure Flume with simple text files defining these components.