Apache Sparkdata~5 mins

Output modes (append, complete, update) in Apache Spark

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Introduction

Output modes decide how new data is added or shown in a streaming result. They help control what you see when data keeps coming in.

When you want to add only new rows to your output without changing old data.

When you want to replace the entire output with fresh results every time.

When you want to update existing rows and add new rows in your output.

Syntax

Apache Spark

df.writeStream.outputMode("mode")
  .format("console")
  .start()

Replace "mode" with one of: "append", "complete", or "update".

Output mode controls how the streaming results are written to the sink.

Examples

Shows only new rows added since last output.

Apache Spark

df.writeStream.outputMode("append")
  .format("console")
  .start()

Shows the full result table every time it updates.

Apache Spark

df.writeStream.outputMode("complete")
  .format("console")
  .start()

Shows only rows that changed or are new since last output.

Apache Spark

df.writeStream.outputMode("update")
  .format("console")
  .start()

Sample Program

This program reads streaming data, groups it by 10-second windows, counts rows, and prints the full aggregated result each time it updates using the 'complete' output mode.

Apache Spark

from pyspark.sql import SparkSession
from pyspark.sql.functions import expr

spark = SparkSession.builder.appName("OutputModesExample").getOrCreate()

# Create streaming DataFrame from rate source (rows with timestamp and value)
inputStream = spark.readStream.format("rate").option("rowsPerSecond", 5).load()

# Aggregate count of rows per 10-second window
aggStream = inputStream.groupBy(expr("window(timestamp, '10 seconds')")).count()

# Start query with output mode 'complete' to see full aggregation each time
query = aggStream.writeStream.outputMode("complete")\
  .format("console")\
  .option("truncate", False)\
  .start()

query.awaitTermination(10000)  # Run for 10 seconds
query.stop()
spark.stop()

OutputSuccess

Important Notes

Append mode only works if you add new rows without updating old ones.

Complete mode outputs the entire result table every time, which can be costly for large data.

Update mode outputs only changed rows but requires sinks that support updates.

Summary

Output modes control how streaming results are shown or saved.

Append adds new rows only, complete shows all results, update shows changed rows.

Choose mode based on your data update needs and sink capabilities.