Complete the code to read streaming data from a socket source.
streamingDF = spark.readStream.format([1]).option("host", "localhost").option("port", 9999).load()
The socket format allows Spark to read streaming data from a network socket, enabling real-time data ingestion.
Complete the code to write streaming data to the console sink.
query = streamingDF.writeStream.format([1]).outputMode("append").start()
The console sink prints streaming data to the console in real-time, useful for debugging and monitoring.
Fix the error in the code to correctly define the streaming aggregation.
aggDF = streamingDF.groupBy("category").agg([1]("value"))
The sum function correctly aggregates the values by category for streaming analytics.
Fill both blanks to filter streaming data for values greater than 100 and select only the 'category' and 'value' columns.
filteredDF = streamingDF.filter(streamingDF.value [1] 100).select([2], "value")
The filter uses > to keep values greater than 100, and select chooses the 'category' and 'value' columns for analysis.
Fill all three blanks to create a streaming query that writes aggregated data to memory with complete output mode and a query name 'aggQuery'.
query = aggDF.writeStream.format([1]).outputMode([2]).queryName([3]).start()
Writing to memory sink with complete output mode stores the full aggregation result, and naming the query 'aggQuery' helps manage it.