Challenge - 5 Problems

🎖️

Delta Lake Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Predict Output

intermediate

2:00remaining

Output of Delta Lake table creation and query

What is the output of the following Apache Spark code using Delta Lake?

Apache Spark

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("DeltaExample").getOrCreate()
data = [(1, "apple"), (2, "banana"), (3, "cherry")]
df = spark.createDataFrame(data, ["id", "fruit"])
df.write.format("delta").mode("overwrite").save("/tmp/delta-table")
df2 = spark.read.format("delta").load("/tmp/delta-table")
df2.filter(df2.id > 1).count()

Attempts:

2 left

❓ data_output

intermediate

2:00remaining

Result of Delta Lake update operation

Given a Delta Lake table with data [(1, 10), (2, 20), (3, 30)] stored at '/tmp/delta-update', what is the content of the table after running this update code?

Apache Spark

from delta.tables import DeltaTable
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("DeltaUpdate").getOrCreate()
data = [(1, 10), (2, 20), (3, 30)]
df = spark.createDataFrame(data, ["id", "value"])
df.write.format("delta").mode("overwrite").save("/tmp/delta-update")
deltaTable = DeltaTable.forPath(spark, "/tmp/delta-update")
deltaTable.update(condition = "id == 2", set = {"value": "value + 5"})
deltaTable.toDF().orderBy("id").collect()

A[(1, 10), (2, 20), (3, 35)]

B[(1, 10), (2, 25), (3, 30)]

C[(1, 15), (2, 20), (3, 30)]

D[(1, 10), (2, 5), (3, 30)]

Attempts:

2 left

🔧 Debug

advanced

2:00remaining

Identify the error in Delta Lake merge code

What error will this Delta Lake merge code produce?

Apache Spark

from delta.tables import DeltaTable
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("DeltaMerge").getOrCreate()
data_target = [(1, "a"), (2, "b")]
data_source = [(2, "bb"), (3, "c")]
df_target = spark.createDataFrame(data_target, ["id", "value"])
df_source = spark.createDataFrame(data_source, ["id", "value"])
df_target.write.format("delta").mode("overwrite").save("/tmp/delta-merge")
deltaTable = DeltaTable.forPath(spark, "/tmp/delta-merge")
deltaTable.alias("t").merge(
    df_source.alias("s"),
    "t.id = s.id"
).whenMatchedUpdate(set = {"value": "s.value"})
.whenNotMatchedInsert(values = {"id": "s.id", "value": "s.value"})
.execute()

ASyntaxError due to missing colon in merge condition

BNo error, merge executes successfully

CAnalysisException due to missing Delta Lake package

DTypeError because set and values expect column expressions, not strings

Attempts:

2 left

❓ visualization

advanced

2:00remaining

Visualize Delta Lake version history

Which option correctly describes how to visualize the version history of a Delta Lake table?

AUse 'DESCRIBE HISTORY delta.`/path/to/table`' SQL command to get version info and plot with matplotlib

BRun 'SHOW VERSIONS delta.`/path/to/table`' to get history and use seaborn to plot

CUse 'SELECT * FROM delta.`/path/to/table` VERSION AS OF 0' to get history and plot with pandas

DRun 'HISTORY delta.`/path/to/table`' command and plot with plotly

Attempts:

2 left

🧠 Conceptual

expert

2:00remaining

Understanding Delta Lake ACID guarantees

Which statement best explains how Delta Lake ensures ACID transactions on big data?

ADelta Lake uses a single writer lock to prevent concurrent writes, ensuring durability.

BDelta Lake relies on HDFS replication to guarantee atomic writes and consistency.

CDelta Lake uses a transaction log that records all changes atomically and supports snapshot isolation for readers.

DDelta Lake stores data in a relational database to enforce ACID properties.

Attempts:

2 left