Challenge - 5 Problems

🎖️

Caching and Persistence Master

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

What is the output count after caching?

Consider the following Apache Spark code snippet:

data = spark.range(1000)
cached_data = data.cache()
count1 = cached_data.count()
count2 = cached_data.count()

What is the value of count2?

Apache Spark

data = spark.range(1000)
cached_data = data.cache()
count1 = cached_data.count()
count2 = cached_data.count()
print(count2)

ARaises an error

CNone

D1000

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

What is the storage level after persisting with MEMORY_AND_DISK?

Given this Spark code:

df = spark.range(10)
df.persist(StorageLevel.MEMORY_AND_DISK)
print(df.storageLevel)

What will be printed?

Apache Spark

from pyspark import StorageLevel
df = spark.range(10)
df.persist(StorageLevel.MEMORY_AND_DISK)
print(df.storageLevel)

AStorageLevel(memory=True, disk=True, offHeap=False, deserialized=True, replication=1)

BStorageLevel(memory=False, disk=True, offHeap=False, deserialized=True, replication=1)

CStorageLevel(memory=True, disk=False, offHeap=False, deserialized=True, replication=1)

DStorageLevel(memory=False, disk=False, offHeap=False, deserialized=False, replication=1)

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Why does unpersisting not free memory immediately?

In Spark, after calling df.unpersist(), the memory is not freed immediately. Why?

ABecause unpersist() is asynchronous and frees memory lazily

BBecause unpersist() deletes the original data source

CBecause unpersist() caches data again automatically

DBecause unpersist() triggers a job that blocks memory release

Attempts:

2 left

🚀 Application

advanced

2:00remaining

Choosing the right persistence level for iterative algorithms

You have a large DataFrame used in multiple iterations of a machine learning algorithm. Which persistence level is best to optimize performance and resource usage?

AStorageLevel.DISK_ONLY

BStorageLevel.MEMORY_AND_DISK_SER

CStorageLevel.MEMORY_ONLY

DStorageLevel.OFF_HEAP

Attempts:

2 left

🧠 Conceptual

expert

3:00remaining

What happens if you cache a DataFrame but the cluster runs out of memory?

When you cache a DataFrame in Spark and the cluster memory is full, what is the expected behavior?

ASpark will automatically increase cluster memory to fit the cache

BSpark will crash immediately due to out-of-memory error

CSpark will evict cached partitions using LRU policy and recompute them when needed

DSpark will convert the DataFrame to disk-only storage without eviction

Attempts:

2 left