Challenge - 5 Problems

🎖️

Accumulator Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of Spark accumulator after RDD operations

Consider the following Spark code that uses an accumulator to count even numbers in an RDD.

val accum = sc.longAccumulator("evenCount")
val rdd = sc.parallelize(1 to 5)
rdd.foreach(x => if (x % 2 == 0) accum.add(1))
println(accum.value)

What will be printed?

Apache Spark

val accum = sc.longAccumulator("evenCount")
val rdd = sc.parallelize(1 to 5)
rdd.foreach(x => if (x % 2 == 0) accum.add(1))
println(accum.value)

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Accumulator value after multiple actions

Given this Spark code snippet:

val accum = sc.longAccumulator("sumAccumulator")
val rdd = sc.parallelize(Seq(1, 2, 3))
rdd.foreach(x => accum.add(x))
rdd.foreach(x => accum.add(x * 2))
println(accum.value)

What is the value printed?

Apache Spark

val accum = sc.longAccumulator("sumAccumulator")
val rdd = sc.parallelize(Seq(1, 2, 3))
rdd.foreach(x => accum.add(x))
rdd.foreach(x => accum.add(x * 2))
println(accum.value)

A12

B18

C21

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in accumulator usage

What error will this Spark code produce?

val accum = sc.longAccumulator("accum")
val rdd = sc.parallelize(1 to 3)
val result = rdd.map(x => accum.add(x)).collect()
println(accum.value)

Apache Spark

val accum = sc.longAccumulator("accum")
val rdd = sc.parallelize(1 to 3)
val result = rdd.map(x => accum.add(x)).collect()
println(accum.value)

BCompilation error

DRuntime error: accumulator cannot be used in map

Attempts:

2 left

🧠 Conceptual

advanced

2:00remaining

Why accumulators are not reliable for transformations

Why should Spark accumulators not be used to update variables inside transformations like map or filter for logic that affects output?

ABecause transformations are lazy and may be recomputed, causing accumulator updates to be counted multiple times

BBecause accumulators only work on the driver node and not on executors

CBecause accumulators cannot be created inside transformations

DBecause accumulators reset automatically after each transformation

Attempts:

2 left

🚀 Application

expert

3:00remaining

Using accumulators to count errors in a Spark job

You want to count how many lines in a text file contain the word "error" using Spark accumulators. Which code snippet correctly counts the occurrences?

val errorCount = sc.longAccumulator("errorCount")
val lines = sc.textFile("log.txt")
// Which option correctly updates errorCount?

Apache Spark

val errorCount = sc.longAccumulator("errorCount")
val lines = sc.textFile("log.txt")

val errors = lines.filter(line =&gt; line.contains("error"))
errors.foreach(_ =&gt; errorCount.add(1))
println(errorCount.value)

val errors = lines.filter(line =&gt; { if (line.contains("error")) errorCount.add(1); true })
errors.count()
println(errorCount.value)

lines.foreach(line =&gt; if (line.contains("error")) errorCount.add(1))
println(errorCount.value)

val errors = lines.map(line =&gt; if (line.contains("error")) errorCount.add(1))
println(errorCount.value)

Attempts:

2 left