0
0
Apache Sparkdata~20 mins

Reduce and aggregate actions in Apache Spark - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Master of Reduce and Aggregate Actions
Get all challenges correct to earn this badge!
Test your skills under time pressure!
Predict Output
intermediate
2:00remaining
What is the output of this Spark reduce operation?
Given the following Spark code, what will be the output printed?
Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master('local').appName('Test').getOrCreate()
rdd = spark.sparkContext.parallelize([1, 2, 3, 4])
result = rdd.reduce(lambda x, y: x + y)
print(result)
ATypeError
B24
C10
DNone
Attempts:
2 left
💡 Hint
Think about what reduce does with the lambda function summing two numbers.
data_output
intermediate
2:00remaining
What is the result of this aggregate action?
Consider this Spark code using aggregate to compute sum and count. What is the final output?
Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master('local').appName('Test').getOrCreate()
rdd = spark.sparkContext.parallelize([1, 2, 3, 4])
zero_value = (0, 0)
seq_op = lambda acc, x: (acc[0] + x, acc[1] + 1)
comb_op = lambda acc1, acc2: (acc1[0] + acc2[0], acc1[1] + acc2[1])
result = rdd.aggregate(zero_value, seq_op, comb_op)
print(result)
A(10, 4)
B(24, 4)
CTypeError
D(10, 10)
Attempts:
2 left
💡 Hint
Aggregate returns a combined result of sum and count.
Predict Output
advanced
2:00remaining
What is the output of this Spark reduceByKey operation?
Given the following Spark code, what will be the output printed?
Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master('local').appName('Test').getOrCreate()
rdd = spark.sparkContext.parallelize([('a', 1), ('b', 2), ('a', 3)])
result = rdd.reduceByKey(lambda x, y: x + y).collect()
print(result)
A[('a', 4), ('b', 2)]
BAttributeError
CTypeError
DValueError
Attempts:
2 left
💡 Hint
reduceByKey sums values for each key.
visualization
advanced
2:00remaining
Visualize the result of countByValue action
Given this Spark code, what is the output of countByValue?
Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master('local').appName('Test').getOrCreate()
rdd = spark.sparkContext.parallelize(['apple', 'banana', 'apple', 'orange', 'banana', 'apple'])
result = rdd.countByValue()
print(result)
A{'apple': 1, 'banana': 1, 'orange': 1}
B{'apple': 2, 'banana': 2, 'orange': 2}
CTypeError
D{'apple': 3, 'banana': 2, 'orange': 1}
Attempts:
2 left
💡 Hint
countByValue counts how many times each value appears.
🧠 Conceptual
expert
1:30remaining
Which Spark action returns a single value by combining all elements?
Among these Spark actions, which one returns a single value by combining all elements of an RDD using a function?
Atake
Breduce
CcountByKey
Dcollect
Attempts:
2 left
💡 Hint
Think about which action combines all elements into one result.