Recall & Review
beginner
What is Structured Streaming in Apache Spark?
Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. It allows you to process live data streams using the same APIs as batch data.
Click to reveal answer
beginner
How does Structured Streaming treat streaming data internally?
Structured Streaming treats streaming data as an unbounded table that is continuously appended. Queries on this table produce incremental results as new data arrives.
Click to reveal answer
intermediate
What is a 'trigger' in Structured Streaming?
A trigger defines how often the streaming query processes new data. For example, a trigger can run the query every 1 second or process data as soon as it arrives.
Click to reveal answer
intermediate
Name two common output modes in Structured Streaming.
1. Append mode: Only new rows are added to the output.<br>2. Complete mode: The entire result table is output every time.<br>There is also Update mode which outputs only rows that changed.
Click to reveal answer
intermediate
What is checkpointing in Structured Streaming and why is it important?
Checkpointing saves the state and progress of a streaming query to reliable storage. It helps recover from failures without losing data or processing progress.
Click to reveal answer
What does Structured Streaming treat streaming data as?
✗ Incorrect
Structured Streaming treats streaming data as an unbounded table that keeps growing as new data arrives.
Which output mode outputs only new rows added since the last trigger?
✗ Incorrect
Append mode outputs only the new rows added since the last trigger.
What is the purpose of checkpointing in Structured Streaming?
✗ Incorrect
Checkpointing saves the state and progress so the query can recover after failures.
Which API does Structured Streaming use to process data?
✗ Incorrect
Structured Streaming uses the Spark SQL DataFrame API for processing streaming data.
What does a trigger control in Structured Streaming?
✗ Incorrect
A trigger controls how often the streaming query processes new data.
Explain how Structured Streaming processes live data using the concept of an unbounded table.
Think about how new data keeps adding rows to a table that never ends.
You got /3 concepts.
Describe the role of checkpointing and triggers in ensuring reliable and timely streaming data processing.
Checkpointing helps recover from crashes; triggers decide when to run the query.
You got /3 concepts.