Challenge - 5 Problems
Master of Reduce and Aggregate Actions
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
What is the output of this Spark reduce operation?
Given the following Spark code, what will be the output printed?
Apache Spark
from pyspark.sql import SparkSession spark = SparkSession.builder.master('local').appName('Test').getOrCreate() rdd = spark.sparkContext.parallelize([1, 2, 3, 4]) result = rdd.reduce(lambda x, y: x + y) print(result)
Attempts:
2 left
💡 Hint
Think about what reduce does with the lambda function summing two numbers.
✗ Incorrect
The reduce action applies the lambda function cumulatively to the elements of the RDD, summing all numbers: 1+2+3+4 = 10.
❓ data_output
intermediate2:00remaining
What is the result of this aggregate action?
Consider this Spark code using aggregate to compute sum and count. What is the final output?
Apache Spark
from pyspark.sql import SparkSession spark = SparkSession.builder.master('local').appName('Test').getOrCreate() rdd = spark.sparkContext.parallelize([1, 2, 3, 4]) zero_value = (0, 0) seq_op = lambda acc, x: (acc[0] + x, acc[1] + 1) comb_op = lambda acc1, acc2: (acc1[0] + acc2[0], acc1[1] + acc2[1]) result = rdd.aggregate(zero_value, seq_op, comb_op) print(result)
Attempts:
2 left
💡 Hint
Aggregate returns a combined result of sum and count.
✗ Incorrect
The aggregate sums all elements (10) and counts them (4), returning (10, 4).
❓ Predict Output
advanced2:00remaining
What is the output of this Spark reduceByKey operation?
Given the following Spark code, what will be the output printed?
Apache Spark
from pyspark.sql import SparkSession spark = SparkSession.builder.master('local').appName('Test').getOrCreate() rdd = spark.sparkContext.parallelize([('a', 1), ('b', 2), ('a', 3)]) result = rdd.reduceByKey(lambda x, y: x + y).collect() print(result)
Attempts:
2 left
💡 Hint
reduceByKey sums values for each key.
✗ Incorrect
reduceByKey sums values for each key: 'a' -> 1+3=4, 'b' -> 2.
❓ visualization
advanced2:00remaining
Visualize the result of countByValue action
Given this Spark code, what is the output of countByValue?
Apache Spark
from pyspark.sql import SparkSession spark = SparkSession.builder.master('local').appName('Test').getOrCreate() rdd = spark.sparkContext.parallelize(['apple', 'banana', 'apple', 'orange', 'banana', 'apple']) result = rdd.countByValue() print(result)
Attempts:
2 left
💡 Hint
countByValue counts how many times each value appears.
✗ Incorrect
The countByValue returns a dictionary with counts of each fruit: apple 3, banana 2, orange 1.
🧠 Conceptual
expert1:30remaining
Which Spark action returns a single value by combining all elements?
Among these Spark actions, which one returns a single value by combining all elements of an RDD using a function?
Attempts:
2 left
💡 Hint
Think about which action combines all elements into one result.
✗ Incorrect
The reduce action combines all elements using a function and returns a single value.