Challenge - 5 Problems

🎖️

Spark Mastery Badge

Get all challenges correct to earn this badge!

Test your skills under time pressure!

🧠 Conceptual

intermediate

2:00remaining

Why is Spark faster than MapReduce?

Which of the following reasons best explains why Apache Spark is generally faster than MapReduce for big data processing?

ASpark uses a single-threaded approach, unlike MapReduce which is multi-threaded.

BSpark uses more disk space than MapReduce, making it faster.

CMapReduce processes data in memory, but Spark writes all data to disk.

DSpark uses in-memory computing which reduces disk I/O, while MapReduce writes intermediate results to disk.

Attempts:

2 left

🧠 Conceptual

intermediate

2:00remaining

How does Spark handle iterative algorithms better than MapReduce?

Why is Apache Spark better suited for iterative algorithms like machine learning compared to MapReduce?

ASpark caches data in memory across iterations, while MapReduce reloads data from disk each time.

BMapReduce caches data in memory, but Spark reloads data from disk every iteration.

CSpark does not support iterative algorithms, unlike MapReduce.

DMapReduce uses in-memory caching, making it faster for iterative tasks.

Attempts:

2 left

❓ data_output

advanced

2:00remaining

Identify the output of Spark RDD transformation

Given the following Spark code snippet, what is the output collected in the driver?

Apache Spark

rdd = sc.parallelize([1, 2, 3, 4, 5])
result = rdd.filter(lambda x: x % 2 == 0).map(lambda x: x * x).collect()
print(result)

A[1, 9, 25]

B[4, 16]

C[2, 4]

D[1, 4, 9, 16, 25]

Attempts:

2 left

🧠 Conceptual

advanced

2:00remaining

Which feature of Spark improves fault tolerance compared to MapReduce?

What feature of Apache Spark helps it recover from failures efficiently without writing intermediate data to disk like MapReduce?

ARDD lineage graph that tracks transformations to recompute lost data.

BWriting all intermediate data to HDFS for recovery.

CUsing a single master node to store all data backups.

DSpark does not support fault tolerance.

Attempts:

2 left

🚀 Application

expert

3:00remaining

Choosing Spark over MapReduce for a real-world task

You have a big dataset and need to run complex machine learning algorithms that require multiple passes over the data. Which reason best justifies choosing Apache Spark over MapReduce?

ASpark cannot handle iterative algorithms, so MapReduce is preferred.

BMapReduce is better because it writes intermediate results to disk, making it faster for iterative tasks.

CSpark's in-memory caching reduces repeated disk reads, speeding up iterative machine learning tasks.

DMapReduce uses in-memory caching which is faster for machine learning.

Attempts:

2 left