0
0
Apache Sparkdata~10 mins

Why streaming enables real-time analytics in Apache Spark - Test Your Understanding

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to read streaming data from a socket source.

Apache Spark
streamingDF = spark.readStream.format([1]).option("host", "localhost").option("port", 9999).load()
Drag options to blanks, or click blank then click option'
A"json"
B"csv"
C"socket"
D"parquet"
Attempts:
3 left
💡 Hint
Common Mistakes
Using file formats like csv or json instead of socket for streaming source.
Forgetting to specify the host and port options.
2fill in blank
medium

Complete the code to write streaming data to the console sink.

Apache Spark
query = streamingDF.writeStream.format([1]).outputMode("append").start()
Drag options to blanks, or click blank then click option'
A"parquet"
B"console"
C"memory"
D"csv"
Attempts:
3 left
💡 Hint
Common Mistakes
Choosing file formats like parquet or csv which are not console sinks.
Using memory sink which stores data but does not print it.
3fill in blank
hard

Fix the error in the code to correctly define the streaming aggregation.

Apache Spark
aggDF = streamingDF.groupBy("category").agg([1]("value"))
Drag options to blanks, or click blank then click option'
Asum
BcountDistinct
Ccollect_list
Dmax
Attempts:
3 left
💡 Hint
Common Mistakes
Using collect_list which collects values but does not aggregate numerically.
Using countDistinct which counts unique items, not sums.
4fill in blank
hard

Fill both blanks to filter streaming data for values greater than 100 and select only the 'category' and 'value' columns.

Apache Spark
filteredDF = streamingDF.filter(streamingDF.value [1] 100).select([2], "value")
Drag options to blanks, or click blank then click option'
A>
B"category"
C<
D"timestamp"
Attempts:
3 left
💡 Hint
Common Mistakes
Using < instead of > in the filter condition.
Selecting the wrong column like 'timestamp' instead of 'category'.
5fill in blank
hard

Fill all three blanks to create a streaming query that writes aggregated data to memory with complete output mode and a query name 'aggQuery'.

Apache Spark
query = aggDF.writeStream.format([1]).outputMode([2]).queryName([3]).start()
Drag options to blanks, or click blank then click option'
A"memory"
B"complete"
C"aggQuery"
D"append"
Attempts:
3 left
💡 Hint
Common Mistakes
Using append mode which only outputs new rows, not full aggregation.
Using console or file formats instead of memory sink.
Not naming the query or using incorrect query name syntax.