Challenge - 5 Problems

🎖️

RDD Mastery Badge

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Understanding RDD Characteristics

Which of the following best describes a Resilient Distributed Dataset (RDD) in Apache Spark?

AA fault-tolerant collection of elements that can be operated on in parallel across a cluster.

BA single-node data storage system optimized for fast reads.

CA graphical user interface for managing Spark jobs.

DA type of database used to store structured data only.

Attempts:

2 left

🧠 Conceptual

intermediate

2:00remaining

RDD Immutability

What does it mean that RDDs are immutable in Apache Spark?

AOnce created, an RDD cannot be changed; transformations create new RDDs instead.

BRDDs automatically delete old data after processing.

CRDDs can be updated in place to save memory.

DRDDs allow direct modification of data on cluster nodes.

Attempts:

2 left

❓ Predict Output

advanced

2:00remaining

RDD Transformation Output

What is the output of the following Spark code snippet?

Apache Spark

rdd = sc.parallelize([1, 2, 3, 4, 5])
result = rdd.filter(lambda x: x % 2 == 0).collect()
print(result)

A[1, 2, 3, 4, 5]

B[2, 4]

C[1, 3, 5]

D[]

Attempts:

2 left

❓ data_output

advanced

2:00remaining

RDD Action Result

Given the following RDD and action, what is the output?

Apache Spark

rdd = sc.parallelize([10, 20, 30, 40])
sum_value = rdd.reduce(lambda a, b: a + b)
print(sum_value)

ATypeError

B40

C10

D100

Attempts:

2 left

🔧 Debug

expert

2:00remaining

Identifying Error in RDD Code

What error will the following Spark code produce?

Apache Spark

rdd = sc.parallelize([1, 2, 3])
result = rdd.map(lambda x: x / 0).collect()
print(result)

AValueError

BTypeError

CZeroDivisionError

DNo error, outputs [inf, inf, inf]

Attempts:

2 left