Consider the following Spark code that uses an accumulator to count even numbers in an RDD.
val accum = sc.longAccumulator("evenCount")
val rdd = sc.parallelize(1 to 5)
rdd.foreach(x => if (x % 2 == 0) accum.add(1))
println(accum.value)What will be printed?
val accum = sc.longAccumulator("evenCount") val rdd = sc.parallelize(1 to 5) rdd.foreach(x => if (x % 2 == 0) accum.add(1)) println(accum.value)
Count how many numbers between 1 and 5 are even.
The even numbers between 1 and 5 are 2 and 4, so the accumulator counts 2.
Given this Spark code snippet:
val accum = sc.longAccumulator("sumAccumulator")
val rdd = sc.parallelize(Seq(1, 2, 3))
rdd.foreach(x => accum.add(x))
rdd.foreach(x => accum.add(x * 2))
println(accum.value)What is the value printed?
val accum = sc.longAccumulator("sumAccumulator") val rdd = sc.parallelize(Seq(1, 2, 3)) rdd.foreach(x => accum.add(x)) rdd.foreach(x => accum.add(x * 2)) println(accum.value)
Sum all elements, then sum all elements multiplied by 2.
Sum of 1+2+3 = 6, sum of 2+4+6 = 12, total 6+12=18.
What error will this Spark code produce?
val accum = sc.longAccumulator("accum")
val rdd = sc.parallelize(1 to 3)
val result = rdd.map(x => accum.add(x)).collect()
println(accum.value)val accum = sc.longAccumulator("accum") val rdd = sc.parallelize(1 to 3) val result = rdd.map(x => accum.add(x)).collect() println(accum.value)
Consider how accumulators behave in transformations and actions.
Accumulator updates inside map are executed on workers; collect triggers the action, so accumulator sums 1+2+3=6.
Why should Spark accumulators not be used to update variables inside transformations like map or filter for logic that affects output?
Think about Spark's lazy evaluation and task retries.
Transformations can be recomputed multiple times, so accumulator updates inside them may be counted more than once, making them unreliable for logic affecting output.
You want to count how many lines in a text file contain the word "error" using Spark accumulators. Which code snippet correctly counts the occurrences?
val errorCount = sc.longAccumulator("errorCount")
val lines = sc.textFile("log.txt")
// Which option correctly updates errorCount?val errorCount = sc.longAccumulator("errorCount") val lines = sc.textFile("log.txt")
Remember that accumulators update reliably only in actions, not transformations.
Option C uses foreach (an action) to update the accumulator reliably. Options B and D update accumulators inside transformations or after transformations without actions, which may cause incorrect counts. Option C uses map but does not trigger an action to update the accumulator.