Practice - 5 Tasks

Answer the questions below

1fill in blank

easy

Complete the code to perform an inner join on streaming DataFrames.

Apache Spark

joined_stream = stream_df1.join(stream_df2, on=[1], how='inner')

Drag options to blanks, or click blank then click option'

A'value'

B'timestamp'

C'id'

D'name'

Attempts:

3 left

2fill in blank

medium

Complete the code to specify a watermark on the streaming DataFrame.

Apache Spark

stream_df = stream_df.withWatermark('[1]', '10 minutes')

Drag options to blanks, or click blank then click option'

AeventTime

BprocessingTime

Ctimestamp

Ddate

Attempts:

3 left

3fill in blank

hard

Fix the error in the join condition for streaming DataFrames.

Apache Spark

joined_stream = stream_df1.join(stream_df2, on=stream_df1.[1] == stream_df2.id)

Drag options to blanks, or click blank then click option'

Atimestamp

Bvalue

Cname

Did

Attempts:

3 left

4fill in blank

hard

Fill both blanks to create a streaming join with watermark and time range condition.

Apache Spark

joined_stream = stream_df1.withWatermark('[1]', '5 minutes')\n    .join(stream_df2.withWatermark('[2]', '5 minutes'), on='id')

Drag options to blanks, or click blank then click option'

AeventTime

BprocessingTime

Ctimestamp

Ddate

Attempts:

3 left

5fill in blank

hard

Fill all three blanks to filter joined streaming data by time range and select columns.

Apache Spark

result = joined_stream.filter((joined_stream.[1] >= joined_stream.[2] - expr('interval 1 hour')) &\n                              (joined_stream.[3] <= joined_stream.[2] + expr('interval 1 hour')))\n               .select('id', 'value', 'timestamp')

Drag options to blanks, or click blank then click option'

Atimestamp

BeventTime

CprocessingTime

Dtime

Attempts:

3 left