0
0
Apache Sparkdata~10 mins

Output modes (append, complete, update) in Apache Spark - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Output modes (append, complete, update)
Start Streaming Query
New Data Arrives
Check Output Mode
Append
Add new
rows only
Write to Sink
Wait for next batch
Repeat
This flow shows how Spark streaming handles new data based on the output mode: append adds new rows, complete replaces all rows, update changes only updated rows.
Execution Sample
Apache Spark
query = df.writeStream
  .outputMode('append')
  .format('console')
  .start()

query.awaitTermination()
This code starts a streaming query that prints only new rows arriving in each batch.
Execution Table
BatchNew DataOutput ModeActionOutput to Sink
1[{id:1, val:10}]appendAdd new rows only[{id:1, val:10}]
2[{id:2, val:20}]appendAdd new rows only[{id:2, val:20}]
3[{id:1, val:15}]appendAdd new rows only[{id:1, val:15}]
1[{id:1, val:10}]completeReplace all rows[{id:1, val:10}]
2[{id:2, val:20}]completeReplace all rows[{id:2, val:20}]
3[{id:1, val:15}]completeReplace all rows[{id:1, val:15}]
1[{id:1, val:10}]updateAdd new or updated rows[{id:1, val:10}]
2[{id:2, val:20}]updateAdd new or updated rows[{id:1, val:10}, {id:2, val:20}]
3[{id:1, val:15}]updateUpdate changed rows only[{id:1, val:15}, {id:2, val:20}]
💡 Streaming ends or is stopped manually; output depends on mode and data changes.
Variable Tracker
VariableStartAfter Batch 1After Batch 2After Batch 3
Output (append)[][{id:1, val:10}][{id:2, val:20}][{id:1, val:15}]
Output (complete)[][{id:1, val:10}][{id:2, val:20}][{id:1, val:15}]
Output (update)[][{id:1, val:10}][{id:1, val:10}, {id:2, val:20}][{id:1, val:15}, {id:2, val:20}]
Key Moments - 3 Insights
Why does the 'append' mode output only new rows even if some rows have changed?
Because 'append' mode only adds new rows that arrive in the batch. It does not update or replace existing rows, as shown in execution_table rows 1-3.
How does 'complete' mode handle updates to existing rows?
'Complete' mode replaces the entire output with all rows seen so far, including updates. This is why output grows and changes fully each batch, as in execution_table rows 4-6.
What is the difference between 'update' and 'append' modes in handling changed rows?
'Update' mode outputs only new or changed rows, updating existing ones, while 'append' outputs only new rows without updates. See execution_table rows 7-9 for 'update' behavior.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table row for batch 3 in 'append' mode. What rows are output to the sink?
A[{id:1, val:10}]
B[{id:1, val:15}]
C[{id:1, val:10}, {id:2, val:20}, {id:1, val:15}]
D[]
💡 Hint
Check the 'Output to Sink' column for batch 3 under 'append' mode in the execution_table.
At which batch does the 'complete' mode output include both id:1 and id:2 rows?
ABatch 1
BBatch 3
CBatch 2
DNever
💡 Hint
Look at the 'Output to Sink' column for 'complete' mode in execution_table rows 4-6.
If the output mode changes from 'append' to 'update', how does the output after batch 3 change?
AIt outputs new and updated rows only.
BIt outputs all rows seen so far.
CIt outputs only new rows, ignoring updates.
DIt outputs nothing.
💡 Hint
Compare 'append' and 'update' outputs in variable_tracker after batch 3.
Concept Snapshot
Output modes control how streaming data is written:
- append: add only new rows
- complete: replace all rows every batch
- update: add new and update changed rows
Choose mode based on sink and use case.
Append is simplest; complete and update handle changes.
Full Transcript
This lesson shows how Apache Spark streaming uses output modes to control data written to sinks. When new data arrives, Spark checks the output mode. In append mode, only new rows are added to the output. Complete mode replaces the entire output with all rows seen so far, including updates. Update mode outputs only new or changed rows, updating existing ones. The execution table traces batches of data arriving and how each mode outputs data differently. Variable tracking shows how output changes after each batch. Key moments clarify common confusions about how updates are handled differently by each mode. The visual quiz tests understanding by asking about outputs at specific batches and mode changes. The snapshot summarizes the modes and their behavior for quick reference.