0
0
Apache Sparkdata~10 mins

Structured Streaming basics in Apache Spark - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to read a streaming DataFrame from a socket source.

Apache Spark
streamingDF = spark.readStream.format([1]).option("host", "localhost").option("port", 9999).load()
Drag options to blanks, or click blank then click option'
A"csv"
B"parquet"
C"json"
D"socket"
Attempts:
3 left
💡 Hint
Common Mistakes
Using file formats like csv or json instead of socket for streaming from a socket.
Forgetting to specify the format as a string.
2fill in blank
medium

Complete the code to start the streaming query that writes output to the console.

Apache Spark
query = streamingDF.writeStream.format([1]).start()
Drag options to blanks, or click blank then click option'
A"console"
B"memory"
C"parquet"
D"csv"
Attempts:
3 left
💡 Hint
Common Mistakes
Using file formats like parquet or csv as output format without specifying a path.
Confusing memory sink with console sink.
3fill in blank
hard

Fix the error in the code to specify the output mode for the streaming query.

Apache Spark
query = streamingDF.writeStream.outputMode([1]).format("console").start()
Drag options to blanks, or click blank then click option'
A"complete"
B"overwrite"
C"append"
D"update"
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'overwrite' which is not a valid output mode for streaming.
Confusing 'complete' mode which outputs the entire result table.
4fill in blank
hard

Fill both blanks to create a streaming aggregation that counts words from a streaming DataFrame.

Apache Spark
wordCounts = streamingDF.selectExpr("explode(split(value, ' ')) as word").groupBy([1]).count().writeStream.outputMode([2]).format("console").start()
Drag options to blanks, or click blank then click option'
A"word"
B"append"
C"complete"
D"value"
Attempts:
3 left
💡 Hint
Common Mistakes
Grouping by the wrong column like 'value' instead of 'word'.
Using 'append' mode which does not work with aggregations without watermark.
5fill in blank
hard

Fill all three blanks to define a streaming query that reads JSON files, selects a column, and writes to memory sink.

Apache Spark
streamingDF = spark.readStream.format([1]).load("/path/to/json")
query = streamingDF.select([2]).writeStream.format([3]).start()
Drag options to blanks, or click blank then click option'
A"json"
B"name"
C"memory"
D"csv"
Attempts:
3 left
💡 Hint
Common Mistakes
Using wrong format like 'csv' for JSON files.
Selecting a column name that does not exist.
Using console sink instead of memory sink when intending to query data.