Challenge - 5 Problems

🎖️

Kafka-Hadoop Integration Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Understanding Kafka and Hadoop Integration

Which component is primarily responsible for moving data from Kafka to Hadoop in a typical integration setup?

AKafka Connect with HDFS Sink Connector

BApache Spark Streaming directly writing to Kafka

CHadoop MapReduce reading from Kafka topics

DKafka Producer sending data to HDFS

Attempts:

2 left

❓ Predict Output

intermediate

2:00remaining

Output of Kafka Connect HDFS Sink Configuration

Given the following Kafka Connect HDFS Sink configuration snippet, what is the expected output directory structure in HDFS?

Hadoop

name=hdfs-sink-connector
connector.class=io.confluent.connect.hdfs.HdfsSinkConnector
tasks.max=1
topics=test-topic
hdfs.url=hdfs://namenode:8020
flush.size=3
format.class=io.confluent.connect.hdfs.avro.AvroFormat
hdfs.topics.dir=/user/kafka/topics

A/user/hdfs/data/test-topic/ with JSON files flushed every 3 records

B/user/kafka/topics/test-topic/partition=0/ with Avro files flushed every 3 records

C/tmp/kafka/test-topic/partition=1/ with Parquet files flushed every 1 record

D/user/kafka/test-topic/partition=0/ with CSV files flushed every 5 records

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Data Output Format from Kafka to Hadoop

When using Kafka Connect HDFS Sink with the default Avro format, what is the schema of the stored data files?

AAvro files containing both data and schema embedded in each file

BPlain text files with JSON strings only

CBinary files without any schema information

DCSV files with header rows describing schema

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identifying Error in Kafka to Hadoop Data Pipeline

Given a Kafka Connect HDFS Sink connector failing with error 'Failed to write to HDFS: Permission denied', what is the most likely cause?

AHDFS Namenode is down

BKafka topic does not exist

CKafka Connect worker does not have write permissions on the target HDFS directory

DKafka Connect configuration missing flush.size parameter

Attempts:

2 left

🚀 Application

expert

3:00remaining

Optimizing Kafka to Hadoop Data Ingestion

You want to optimize the Kafka Connect HDFS Sink to reduce small file creation and improve throughput. Which configuration change is most effective?

ASet topics to multiple unrelated topics

BDecrease the tasks.max to 1

CChange format.class to CSVFormat

DIncrease the flush.size parameter to a larger number

Attempts:

2 left