0
0
Hadoopdata~30 mins

Flume for log collection in Hadoop - Mini Project: Build & Apply

Choose your learning style9 modes available
Collecting Logs Using Flume in Hadoop
📖 Scenario: You are working as a system administrator for a company that needs to collect and store application logs efficiently. You will use Apache Flume to collect logs from a source and send them to Hadoop's HDFS for storage and analysis.
🎯 Goal: Build a simple Flume configuration to collect logs from a local file source and write them into HDFS.
📋 What You'll Learn
Create a Flume agent configuration file with a source, channel, and sink
Configure the source to read from a local log file
Configure the channel as a memory channel
Configure the sink to write logs to HDFS
Use exact names for the agent, source, channel, and sink as specified
💡 Why This Matters
🌍 Real World
Companies use Flume to collect logs from many servers and store them centrally in Hadoop for analysis and monitoring.
💼 Career
Understanding Flume configuration is important for roles in big data engineering, system administration, and data pipeline development.
Progress0 / 4 steps
1
Create Flume Agent and Source Configuration
Create a Flume agent named agent1. Add a source named src1 of type exec that runs the command tail -F /var/log/syslog to collect logs continuously.
Hadoop
Need a hint?

The source type exec runs a shell command to collect logs. Use tail -F /var/log/syslog to follow the system log file.

2
Configure Memory Channel
Add a channel named ch1 of type memory to the agent agent1. Set the capacity to 10000 and the transaction capacity to 1000.
Hadoop
Need a hint?

The memory channel temporarily stores events in memory. Set capacity and transactionCapacity to control buffer sizes.

3
Configure HDFS Sink
Add a sink named sink1 of type hdfs to the agent agent1. Set the HDFS path to /user/logs/ and the file prefix to log-.
Hadoop
Need a hint?

The sink writes events to HDFS. Set the hdfs.path and hdfs.filePrefix to organize files.

4
Connect Source, Channel, and Sink
Connect the source src1, channel ch1, and sink sink1 in the agent agent1 by setting the source's channel to ch1 and the sink's channel to ch1.
Hadoop
Need a hint?

Sources and sinks must be connected to channels by setting sources.src1.channels and sinks.sink1.channel to the channel name.