Challenge - 5 Problems

🎖️

RDD Mastery Badge

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of RDD count after filtering

What is the output of the following Spark code snippet?

Apache Spark

val sc = spark.sparkContext
val data = List(1, 2, 3, 4, 5, 6)
val rdd = sc.parallelize(data)
val filteredRDD = rdd.filter(x => x % 2 == 0)
val count = filteredRDD.count()
println(count)

Attempts:

2 left

❓ data_output

intermediate

1:30remaining

Result of reading a text file into an RDD

Given a text file with 4 lines, what will be the number of elements in the RDD after reading it with sc.textFile?

Apache Spark

val rdd = sc.textFile("/path/to/file.txt")
rdd.count()

DDepends on file size

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in RDD creation from a collection

What error will this Spark code produce?

Apache Spark

val data = 1 to 5
val rdd = sc.parallelize(data.toString())
rdd.collect().foreach(println)

AThe RDD contains characters of the string 'Range(1,2,3,4,5)'

BCompilation error: toString cannot be used here

CRuntime error: Unsupported operation on Range

DThe RDD contains numbers 1 to 5 as expected

Attempts:

2 left

🧠 Conceptual

advanced

1:30remaining

Difference between sc.parallelize and sc.textFile

Which statement correctly describes the difference between sc.parallelize and sc.textFile?

ABoth create RDDs from collections but sc.textFile supports filtering

Bsc.parallelize reads data from a file, sc.textFile creates an RDD from a collection

Csc.parallelize creates an RDD from an existing collection in memory, sc.textFile reads data from a file into an RDD

DBoth create RDDs from files but sc.parallelize is faster

Attempts:

2 left

❓ visualization

expert

2:30remaining

Visualizing partitions of an RDD created from a file

You create an RDD from a text file with 100 lines using sc.textFile with 4 partitions. Which visualization best represents the distribution of lines across partitions?

AA bar chart with 4 bars each showing approximately 25 lines

BA pie chart with 100 equal slices representing each line

CA line chart showing increasing number of lines per partition from 1 to 4

DA scatter plot with 100 points randomly scattered

Attempts:

2 left