Recall & Review
beginner
What is an accumulator variable in Apache Spark?
An accumulator variable is a special variable used to count or sum values across multiple tasks in a distributed Spark job. It helps track information like counts or sums safely from many workers.
Click to reveal answer
beginner
How do accumulator variables behave in Spark tasks?
Accumulator variables can only be added to inside tasks. They do not change the original value in the driver until the job finishes. This prevents errors from multiple updates in parallel.
Click to reveal answer
intermediate
Why should accumulator variables only be used for adding or counting?
Because accumulators are designed to safely add values from many tasks, using them for other operations like subtraction or multiplication can cause incorrect results due to task retries or parallelism.
Click to reveal answer
beginner
Show a simple example of creating and using an accumulator in Spark with Python.
In PySpark, you create an accumulator with sc.accumulator(0). Then inside an RDD operation, you add to it like accum.add(1). After the job, you get the value with accum.value.
Click to reveal answer
intermediate
What happens if you try to read an accumulator's value inside a Spark task?
You cannot reliably read the accumulator's value inside tasks because it is only updated in the driver after the job completes. Reading it inside tasks may give wrong or zero values.
Click to reveal answer
What is the main use of accumulator variables in Spark?
✗ Incorrect
Accumulators are designed to safely aggregate counts or sums from many parallel tasks.
Which operation is safe to perform on accumulators inside Spark tasks?
✗ Incorrect
Only addition (or commutative operations like sum) is safe with accumulators to avoid errors from retries.
When can you reliably read the value of an accumulator in Spark?
✗ Incorrect
Accumulator values are updated in the driver after job completion, not during task execution.
What happens if a Spark task is retried when using accumulators?
✗ Incorrect
Retries can cause accumulator additions to be counted more than once, so only addition is safe.
Which Spark context method is used to create an accumulator in PySpark?
✗ Incorrect
The sc.accumulator() method creates an accumulator variable.
Explain what accumulator variables are and why they are useful in Spark.
Think about how you count things safely when many people work in parallel.
You got /4 concepts.
Describe best practices and limitations when using accumulator variables in Spark.
Consider what can go wrong if you try to do other math or read values too early.
You got /4 concepts.