0
0
Hadoopdata~30 mins

NiFi for data flow automation in Hadoop - Mini Project: Build & Apply

Choose your learning style9 modes available
NiFi for Data Flow Automation
📖 Scenario: You work in a company that collects sales data from multiple stores every day. The data arrives as CSV files in a folder. You want to automate the process of reading these files, filtering out sales below a certain amount, and saving the filtered data to another folder for analysis.
🎯 Goal: Build a simple NiFi data flow that reads CSV files from an input folder, filters sales records where the amount is greater than a threshold, and writes the filtered records to an output folder.
📋 What You'll Learn
Create a NiFi flow to read CSV files from a directory
Add a configuration variable for the sales amount threshold
Use a processor to filter records based on the threshold
Write the filtered records to an output directory
💡 Why This Matters
🌍 Real World
Automating data ingestion and filtering is common in retail and many industries to prepare data for analysis without manual work.
💼 Career
NiFi skills are valuable for data engineers and analysts who build reliable data pipelines and automate data workflows.
Progress0 / 4 steps
1
Set up the input data folder and sample CSV file
Create a directory called /data/input_sales and place a CSV file named sales.csv inside it with these exact contents:
store_id,product,amount 1,apple,50 2,banana,30 3,orange,70
Hadoop
Need a hint?

Use your terminal or file explorer to create the folder and file with the exact CSV content.

2
Configure the sales amount threshold variable
In NiFi, create a variable named sales_threshold and set its value to 40 to filter sales amounts greater than 40.
Hadoop
Need a hint?

Use NiFi's Variables tab in the Process Group to add sales_threshold with value 40.

3
Create the NiFi flow to filter sales records
Use the ListFile processor to read files from /data/input_sales, then connect it to a QueryRecord processor that filters records where amount > ${sales_threshold}. Use the correct Record Reader and Writer controllers for CSV format.
Hadoop
Need a hint?

Use NiFi UI to add and configure processors and controller services as described.

4
Output the filtered sales records
Add a PutFile processor connected to QueryRecord to write the filtered CSV files to /data/output_sales. Then, print the contents of the output CSV file to verify only sales with amount greater than 40 are saved.
Hadoop
Need a hint?

Use NiFi to add PutFile processor and configure output folder. Then check the output file content.