0
0
Apache Sparkdata~20 mins

What is an RDD (Resilient Distributed Dataset) in Apache Spark - Practice Questions & Exercises

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
RDD Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
🧠 Conceptual
intermediate
2:00remaining
Understanding RDD Characteristics

Which of the following best describes a Resilient Distributed Dataset (RDD) in Apache Spark?

AA fault-tolerant collection of elements that can be operated on in parallel across a cluster.
BA single-node data storage system optimized for fast reads.
CA graphical user interface for managing Spark jobs.
DA type of database used to store structured data only.
Attempts:
2 left
💡 Hint

Think about how Spark handles data across multiple machines and recovers from failures.

🧠 Conceptual
intermediate
2:00remaining
RDD Immutability

What does it mean that RDDs are immutable in Apache Spark?

AOnce created, an RDD cannot be changed; transformations create new RDDs instead.
BRDDs automatically delete old data after processing.
CRDDs can be updated in place to save memory.
DRDDs allow direct modification of data on cluster nodes.
Attempts:
2 left
💡 Hint

Consider how Spark manages data consistency and fault tolerance.

Predict Output
advanced
2:00remaining
RDD Transformation Output

What is the output of the following Spark code snippet?

Apache Spark
rdd = sc.parallelize([1, 2, 3, 4, 5])
result = rdd.filter(lambda x: x % 2 == 0).collect()
print(result)
A[1, 2, 3, 4, 5]
B[2, 4]
C[1, 3, 5]
D[]
Attempts:
2 left
💡 Hint

Filter keeps elements where the condition is true.

data_output
advanced
2:00remaining
RDD Action Result

Given the following RDD and action, what is the output?

Apache Spark
rdd = sc.parallelize([10, 20, 30, 40])
sum_value = rdd.reduce(lambda a, b: a + b)
print(sum_value)
ATypeError
B40
C10
D100
Attempts:
2 left
💡 Hint

Reduce combines all elements using the given function.

🔧 Debug
expert
2:00remaining
Identifying Error in RDD Code

What error will the following Spark code produce?

Apache Spark
rdd = sc.parallelize([1, 2, 3])
result = rdd.map(lambda x: x / 0).collect()
print(result)
AValueError
BTypeError
CZeroDivisionError
DNo error, outputs [inf, inf, inf]
Attempts:
2 left
💡 Hint

Consider what happens when dividing by zero in Python.