Apache Sparkdata~10 mins

Output modes (append, complete, update) in Apache Spark - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Output modes (append, complete, update)

Start Streaming Query

↓

New Data Arrives

↓

Check Output Mode

↓

Append

↓

Add new

↓

rows only

↓

Write to Sink

↓

Wait for next batch

↓

Repeat

This flow shows how Spark streaming handles new data based on the output mode: append adds new rows, complete replaces all rows, update changes only updated rows.

Execution Sample

Apache Spark

query = df.writeStream
  .outputMode('append')
  .format('console')
  .start()

query.awaitTermination()

This code starts a streaming query that prints only new rows arriving in each batch.

Execution Table

Batch	New Data	Output Mode	Action	Output to Sink
1	[{id:1, val:10}]	append	Add new rows only	[{id:1, val:10}]
2	[{id:2, val:20}]	append	Add new rows only	[{id:2, val:20}]
3	[{id:1, val:15}]	append	Add new rows only	[{id:1, val:15}]
1	[{id:1, val:10}]	complete	Replace all rows	[{id:1, val:10}]
2	[{id:2, val:20}]	complete	Replace all rows	[{id:2, val:20}]
3	[{id:1, val:15}]	complete	Replace all rows	[{id:1, val:15}]
1	[{id:1, val:10}]	update	Add new or updated rows	[{id:1, val:10}]
2	[{id:2, val:20}]	update	Add new or updated rows	[{id:1, val:10}, {id:2, val:20}]
3	[{id:1, val:15}]	update	Update changed rows only	[{id:1, val:15}, {id:2, val:20}]

💡 Streaming ends or is stopped manually; output depends on mode and data changes.

Variable Tracker

Variable	Start	After Batch 1	After Batch 2	After Batch 3
Output (append)	[]	[{id:1, val:10}]	[{id:2, val:20}]	[{id:1, val:15}]
Output (complete)	[]	[{id:1, val:10}]	[{id:2, val:20}]	[{id:1, val:15}]
Output (update)	[]	[{id:1, val:10}]	[{id:1, val:10}, {id:2, val:20}]	[{id:1, val:15}, {id:2, val:20}]

Key Moments - 3 Insights

Why does the 'append' mode output only new rows even if some rows have changed?

How does 'complete' mode handle updates to existing rows?

What is the difference between 'update' and 'append' modes in handling changed rows?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table row for batch 3 in 'append' mode. What rows are output to the sink?

A[{id:1, val:10}]

B[{id:1, val:15}]

C[{id:1, val:10}, {id:2, val:20}, {id:1, val:15}]

D[]

Concept Snapshot

Output modes control how streaming data is written:
- append: add only new rows
- complete: replace all rows every batch
- update: add new and update changed rows
Choose mode based on sink and use case.
Append is simplest; complete and update handle changes.

Full Transcript

This lesson shows how Apache Spark streaming uses output modes to control data written to sinks. When new data arrives, Spark checks the output mode. In append mode, only new rows are added to the output. Complete mode replaces the entire output with all rows seen so far, including updates. Update mode outputs only new or changed rows, updating existing ones. The execution table traces batches of data arriving and how each mode outputs data differently. Variable tracking shows how output changes after each batch. Key moments clarify common confusions about how updates are handled differently by each mode. The visual quiz tests understanding by asking about outputs at specific batches and mode changes. The snapshot summarizes the modes and their behavior for quick reference.