Challenge - 5 Problems
RDD Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of RDD count after filtering
What is the output of the following Spark code snippet?
Apache Spark
val sc = spark.sparkContext val data = List(1, 2, 3, 4, 5, 6) val rdd = sc.parallelize(data) val filteredRDD = rdd.filter(x => x % 2 == 0) val count = filteredRDD.count() println(count)
Attempts:
2 left
💡 Hint
Count how many numbers in the list are even.
✗ Incorrect
The list has 6 numbers. Even numbers are 2, 4, and 6, so count is 3.
❓ data_output
intermediate1:30remaining
Result of reading a text file into an RDD
Given a text file with 4 lines, what will be the number of elements in the RDD after reading it with sc.textFile?
Apache Spark
val rdd = sc.textFile("/path/to/file.txt")
rdd.count()Attempts:
2 left
💡 Hint
Each line in the file becomes one element in the RDD.
✗ Incorrect
sc.textFile reads the file line by line, so the RDD has one element per line.
🔧 Debug
advanced2:00remaining
Identify the error in RDD creation from a collection
What error will this Spark code produce?
Apache Spark
val data = 1 to 5 val rdd = sc.parallelize(data.toString()) rdd.collect().foreach(println)
Attempts:
2 left
💡 Hint
Check what data.toString returns and what parallelize expects.
✗ Incorrect
data.toString converts the range to a string, so parallelize treats it as a sequence of characters.
🧠 Conceptual
advanced1:30remaining
Difference between sc.parallelize and sc.textFile
Which statement correctly describes the difference between sc.parallelize and sc.textFile?
Attempts:
2 left
💡 Hint
Think about the source of data for each method.
✗ Incorrect
sc.parallelize is used to create RDDs from in-memory collections, while sc.textFile reads data from files.
❓ visualization
expert2:30remaining
Visualizing partitions of an RDD created from a file
You create an RDD from a text file with 100 lines using sc.textFile with 4 partitions. Which visualization best represents the distribution of lines across partitions?
Attempts:
2 left
💡 Hint
Partitions try to split data evenly.
✗ Incorrect
sc.textFile splits the file lines evenly across the specified partitions, so each partition has about 25 lines.