0
0
Hadoopdata~20 mins

Kafka integration with Hadoop - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Kafka-Hadoop Integration Master
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding Kafka and Hadoop Integration

Which component is primarily responsible for moving data from Kafka to Hadoop in a typical integration setup?

AKafka Connect with HDFS Sink Connector
BApache Spark Streaming directly writing to Kafka
CHadoop MapReduce reading from Kafka topics
DKafka Producer sending data to HDFS
Attempts:
2 left
💡 Hint

Think about the tool designed to connect Kafka with external systems like Hadoop.

Predict Output
intermediate
2:00remaining
Output of Kafka Connect HDFS Sink Configuration

Given the following Kafka Connect HDFS Sink configuration snippet, what is the expected output directory structure in HDFS?

Hadoop
name=hdfs-sink-connector
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=test-topic
hdfs.url=hdfs://namenode:8020
flush.size=3
format.class=io.confluent.connect.hdfs.avro.AvroFormat
hdfs.topics.dir=/user/kafka/topics
A/user/hdfs/data/test-topic/ with JSON files flushed every 3 records
B/user/kafka/topics/test-topic/partition=0/ with Avro files flushed every 3 records
C/tmp/kafka/test-topic/partition=1/ with Parquet files flushed every 1 record
D/user/kafka/test-topic/partition=0/ with CSV files flushed every 5 records
Attempts:
2 left
💡 Hint

Look at the topic name, flush size, and format class in the config.

data_output
advanced
2:00remaining
Data Output Format from Kafka to Hadoop

When using Kafka Connect HDFS Sink with the default Avro format, what is the schema of the stored data files?

AAvro files containing both data and schema embedded in each file
BPlain text files with JSON strings only
CBinary files without any schema information
DCSV files with header rows describing schema
Attempts:
2 left
💡 Hint

Consider how Avro format handles schema and data storage.

🔧 Debug
advanced
2:00remaining
Identifying Error in Kafka to Hadoop Data Pipeline

Given a Kafka Connect HDFS Sink connector failing with error 'Failed to write to HDFS: Permission denied', what is the most likely cause?

AHDFS Namenode is down
BKafka topic does not exist
CKafka Connect worker does not have write permissions on the target HDFS directory
DKafka Connect configuration missing flush.size parameter
Attempts:
2 left
💡 Hint

Think about file system permissions and access rights.

🚀 Application
expert
3:00remaining
Optimizing Kafka to Hadoop Data Ingestion

You want to optimize the Kafka Connect HDFS Sink to reduce small file creation and improve throughput. Which configuration change is most effective?

ASet topics to multiple unrelated topics
BDecrease the tasks.max to 1
CChange format.class to CSVFormat
DIncrease the flush.size parameter to a larger number
Attempts:
2 left
💡 Hint

Think about how flush.size affects file creation frequency.