0
0
Apache Sparkdata~10 mins

Watermarking for late data in Apache Spark - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to set a watermark on the streaming DataFrame to handle late data.

Apache Spark
streamingDF = streamingDF.withWatermark('[1]', '10 minutes')
Drag options to blanks, or click blank then click option'
Atimestamp
BeventTime
Cdate
Dtime
Attempts:
3 left
💡 Hint
Common Mistakes
Using a column name that does not exist in the DataFrame.
Using a non-timestamp column for watermarking.
2fill in blank
medium

Complete the code to drop late data that is older than the watermark delay.

Apache Spark
cleanedDF = streamingDF.dropDuplicates(['userId', '[1]'])
Drag options to blanks, or click blank then click option'
Adate
Btimestamp
CeventTime
Dtime
Attempts:
3 left
💡 Hint
Common Mistakes
Using a non-unique column for dropping duplicates.
Using a column different from the watermark timestamp.
3fill in blank
hard

Fix the error in the watermarking code by completing the missing argument.

Apache Spark
streamingDF = streamingDF.withWatermark('timestamp', '[1]')
Drag options to blanks, or click blank then click option'
A10 minutes
B10min
Cminutes 10
D10
Attempts:
3 left
💡 Hint
Common Mistakes
Using only a number without units.
Using incorrect unit format.
4fill in blank
hard

Fill both blanks to create a streaming aggregation with watermarking and windowing.

Apache Spark
result = streamingDF.withWatermark('[1]', '5 minutes').groupBy(window('[2]', '10 minutes')).count()
Drag options to blanks, or click blank then click option'
Atimestamp
BeventTime
Attempts:
3 left
💡 Hint
Common Mistakes
Using different columns for watermark and window.
Using a non-timestamp column.
5fill in blank
hard

Fill all three blanks to filter late data using watermark and event time comparison.

Apache Spark
filteredDF = streamingDF.withWatermark('[1]', '15 minutes').filter(streamingDF.[2] > streamingDF.[3])
Drag options to blanks, or click blank then click option'
Atimestamp
BeventTime
Cwatermark
Attempts:
3 left
💡 Hint
Common Mistakes
Using inconsistent column names.
Comparing wrong columns in filter.