0
0
Hadoopdata~5 mins

NiFi for data flow automation in Hadoop

Choose your learning style9 modes available
Introduction

NiFi helps move and manage data automatically between systems without manual work.

You want to collect data from many sources like files, databases, or sensors.
You need to clean or change data before saving it somewhere else.
You want to send data continuously in real-time to another system.
You want to track where data came from and where it goes.
You want to build a simple way to connect different data tools.
Syntax
Hadoop
NiFi uses a visual interface where you drag and drop components called processors to build a data flow.
Each processor does one task like reading, transforming, or writing data.
You connect processors with arrows to show how data moves.
Examples
This flow reads files, converts their format, then saves them to Hadoop storage.
Hadoop
GetFile -> ConvertRecord -> PutHDFS
This flow listens for data from web requests, decides where to send it, then sends it to Kafka.
Hadoop
ListenHTTP -> RouteOnAttribute -> PutKafka
Sample Program

This code simulates a simple data flow: reading data, changing it, and saving it.

Hadoop
# NiFi flows are built visually, but here is a simple Python example to show data flow logic

# Simulate reading data
data = ['apple', 'banana', 'cherry']

# Transform data to uppercase
transformed = [item.upper() for item in data]

# Simulate writing data
for item in transformed:
    print(f'Saved: {item}')
OutputSuccess
Important Notes

NiFi is designed to be easy to use with no coding needed for many tasks.

It tracks data so you can see if something goes wrong.

You can schedule flows to run automatically or trigger them by events.

Summary

NiFi automates moving and changing data between systems.

It uses a visual drag-and-drop interface with processors connected by arrows.

NiFi helps handle data in real-time and keeps track of data paths.