0
0
Apache Sparkdata~5 mins

Structured Streaming basics in Apache Spark - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is Structured Streaming in Apache Spark?
Structured Streaming is a scalable and fault-tolerant stream processing engine built on the Spark SQL engine. It allows you to process live data streams using the same APIs as batch data.
Click to reveal answer
beginner
How does Structured Streaming treat streaming data internally?
Structured Streaming treats streaming data as an unbounded table that is continuously appended. Queries on this table produce incremental results as new data arrives.
Click to reveal answer
intermediate
What is a 'trigger' in Structured Streaming?
A trigger defines how often the streaming query processes new data. For example, a trigger can run the query every 1 second or process data as soon as it arrives.
Click to reveal answer
intermediate
Name two common output modes in Structured Streaming.
1. Append mode: Only new rows are added to the output.<br>2. Complete mode: The entire result table is output every time.<br>There is also Update mode which outputs only rows that changed.
Click to reveal answer
intermediate
What is checkpointing in Structured Streaming and why is it important?
Checkpointing saves the state and progress of a streaming query to reliable storage. It helps recover from failures without losing data or processing progress.
Click to reveal answer
What does Structured Streaming treat streaming data as?
AAn unbounded table
BA static file
CA batch job
DA database
Which output mode outputs only new rows added since the last trigger?
AOverwrite mode
BComplete mode
CUpdate mode
DAppend mode
What is the purpose of checkpointing in Structured Streaming?
ATo speed up queries
BTo format output
CTo save query progress for recovery
DTo delete old data
Which API does Structured Streaming use to process data?
ASpark SQL DataFrame API
BRDD API
CMapReduce API
DHadoop API
What does a trigger control in Structured Streaming?
AHow data is stored
BHow often the query runs
CThe output format
DThe data source
Explain how Structured Streaming processes live data using the concept of an unbounded table.
Think about how new data keeps adding rows to a table that never ends.
You got /3 concepts.
    Describe the role of checkpointing and triggers in ensuring reliable and timely streaming data processing.
    Checkpointing helps recover from crashes; triggers decide when to run the query.
    You got /3 concepts.