Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to set a watermark on the streaming DataFrame to handle late data.

Apache Spark

streamingDF = streamingDF.withWatermark('[1]', '10 minutes')

Drag options to blanks, or click blank then click option'

Atimestamp

BeventTime

Cdate

Dtime

Attempts:

3 left

2fill in blank

medium

Complete the code to drop late data that is older than the watermark delay.

Apache Spark

cleanedDF = streamingDF.dropDuplicates(['userId', '[1]'])

Drag options to blanks, or click blank then click option'

Adate

Btimestamp

CeventTime

Dtime

Attempts:

3 left

3fill in blank

hard

Fix the error in the watermarking code by completing the missing argument.

Apache Spark

streamingDF = streamingDF.withWatermark('timestamp', '[1]')

Drag options to blanks, or click blank then click option'

A10 minutes

B10min

Cminutes 10

D10

Attempts:

3 left

4fill in blank

hard

Fill both blanks to create a streaming aggregation with watermarking and windowing.

Apache Spark

result = streamingDF.withWatermark('[1]', '5 minutes').groupBy(window('[2]', '10 minutes')).count()

Drag options to blanks, or click blank then click option'

Atimestamp

BeventTime

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to filter late data using watermark and event time comparison.

Apache Spark

filteredDF = streamingDF.withWatermark('[1]', '15 minutes').filter(streamingDF.[2] > streamingDF.[3])

Drag options to blanks, or click blank then click option'

Atimestamp

BeventTime

Cwatermark

Attempts:

3 left