Kafka integration with Hadoop helps move data from real-time streams into big storage for analysis. It connects fast data sources with Hadoop's storage and processing power.
Kafka integration with Hadoop
kafka-console-consumer --bootstrap-server <kafka-broker> --topic <topic-name> --from-beginning | hadoop fs -appendToFile - <hdfs-path>
This command reads messages from Kafka and writes them into Hadoop's HDFS.
You can also use tools like Apache Flume or Apache NiFi for more complex integration.
kafka-console-consumer --bootstrap-server localhost:9092 --topic sensor-data --from-beginning | hadoop fs -appendToFile - /user/hadoop/sensor-data/data.txtflume-ng agent -n agent1 -c conf -f flume-kafka-hdfs.conf
This script reads all messages from Kafka topic 'test-topic' and appends them as a file in HDFS. Then it lists and shows the saved data.
# This example uses Kafka console consumer and Hadoop fs commands # Step 1: Start Kafka consumer to read from topic 'test-topic' # Step 2: Pipe output to HDFS file '/user/hadoop/test-topic-data/data.txt' kafka-console-consumer --bootstrap-server localhost:9092 --topic test-topic --from-beginning | hadoop fs -appendToFile - /user/hadoop/test-topic-data/data.txt # After running, check data in HDFS hadoop fs -ls /user/hadoop/test-topic-data hadoop fs -cat /user/hadoop/test-topic-data/data.txt
Make sure Kafka and Hadoop services are running before integration.
Data formats should be compatible between Kafka messages and Hadoop storage.
For large-scale or continuous data, use tools like Apache Flume or NiFi instead of simple console commands.
Kafka integration with Hadoop moves streaming data into big storage for analysis.
Use simple commands for small data or tools like Flume for production pipelines.
Check data in HDFS after transfer to confirm successful integration.