Challenge - 5 Problems
Delta Lake Mastery
Get all challenges correct to earn this badge!
Test your skills under time pressure!
❓ Predict Output
intermediate2:00remaining
Output of Delta Lake table creation and query
What is the output of the following Apache Spark code using Delta Lake?
Apache Spark
from pyspark.sql import SparkSession spark = SparkSession.builder.appName("DeltaExample").getOrCreate() data = [(1, "apple"), (2, "banana"), (3, "cherry")] df = spark.createDataFrame(data, ["id", "fruit"]) df.write.format("delta").mode("overwrite").save("/tmp/delta-table") df2 = spark.read.format("delta").load("/tmp/delta-table") df2.filter(df2.id > 1).count()
Attempts:
2 left
💡 Hint
Count rows where id is greater than 1.
✗ Incorrect
The original data has 3 rows with ids 1, 2, and 3. Filtering for id > 1 keeps rows with ids 2 and 3, so count is 2.
❓ data_output
intermediate2:00remaining
Result of Delta Lake update operation
Given a Delta Lake table with data [(1, 10), (2, 20), (3, 30)] stored at '/tmp/delta-update', what is the content of the table after running this update code?
Apache Spark
from delta.tables import DeltaTable from pyspark.sql import SparkSession spark = SparkSession.builder.appName("DeltaUpdate").getOrCreate() data = [(1, 10), (2, 20), (3, 30)] df = spark.createDataFrame(data, ["id", "value"]) df.write.format("delta").mode("overwrite").save("/tmp/delta-update") deltaTable = DeltaTable.forPath(spark, "/tmp/delta-update") deltaTable.update(condition = "id == 2", set = {"value": "value + 5"}) deltaTable.toDF().orderBy("id").collect()
Attempts:
2 left
💡 Hint
Only the row with id 2 is updated by adding 5 to its value.
✗ Incorrect
The update changes the value for id 2 from 20 to 25. Other rows remain unchanged.
🔧 Debug
advanced2:00remaining
Identify the error in Delta Lake merge code
What error will this Delta Lake merge code produce?
Apache Spark
from delta.tables import DeltaTable from pyspark.sql import SparkSession spark = SparkSession.builder.appName("DeltaMerge").getOrCreate() data_target = [(1, "a"), (2, "b")] data_source = [(2, "bb"), (3, "c")] df_target = spark.createDataFrame(data_target, ["id", "value"]) df_source = spark.createDataFrame(data_source, ["id", "value"]) df_target.write.format("delta").mode("overwrite").save("/tmp/delta-merge") deltaTable = DeltaTable.forPath(spark, "/tmp/delta-merge") deltaTable.alias("t").merge( df_source.alias("s"), "t.id = s.id" ).whenMatchedUpdate(set = {"value": "s.value"}) .whenNotMatchedInsert(values = {"id": "s.id", "value": "s.value"}) .execute()
Attempts:
2 left
💡 Hint
Check the types used in set and values arguments in merge.
✗ Incorrect
The set and values parameters require column expressions, not string literals. Passing strings causes a TypeError.
❓ visualization
advanced2:00remaining
Visualize Delta Lake version history
Which option correctly describes how to visualize the version history of a Delta Lake table?
Attempts:
2 left
💡 Hint
Delta Lake supports DESCRIBE HISTORY command for version info.
✗ Incorrect
DESCRIBE HISTORY returns the version history of a Delta table. This data can be collected and visualized using Python plotting libraries.
🧠 Conceptual
expert2:00remaining
Understanding Delta Lake ACID guarantees
Which statement best explains how Delta Lake ensures ACID transactions on big data?
Attempts:
2 left
💡 Hint
Think about how Delta Lake tracks changes and manages concurrent access.
✗ Incorrect
Delta Lake maintains a transaction log that records every change atomically. This log enables snapshot isolation, so readers see a consistent view even during concurrent writes.