Output modes decide how new data is added or shown in a streaming result. They help control what you see when data keeps coming in.
Output modes (append, complete, update) in Apache Spark
df.writeStream.outputMode("mode") .format("console") .start()
Replace "mode" with one of: "append", "complete", or "update".
Output mode controls how the streaming results are written to the sink.
df.writeStream.outputMode("append") .format("console") .start()
df.writeStream.outputMode("complete") .format("console") .start()
df.writeStream.outputMode("update") .format("console") .start()
This program reads streaming data, groups it by 10-second windows, counts rows, and prints the full aggregated result each time it updates using the 'complete' output mode.
from pyspark.sql import SparkSession from pyspark.sql.functions import expr spark = SparkSession.builder.appName("OutputModesExample").getOrCreate() # Create streaming DataFrame from rate source (rows with timestamp and value) inputStream = spark.readStream.format("rate").option("rowsPerSecond", 5).load() # Aggregate count of rows per 10-second window aggStream = inputStream.groupBy(expr("window(timestamp, '10 seconds')")).count() # Start query with output mode 'complete' to see full aggregation each time query = aggStream.writeStream.outputMode("complete")\ .format("console")\ .option("truncate", False)\ .start() query.awaitTermination(10000) # Run for 10 seconds query.stop() spark.stop()
Append mode only works if you add new rows without updating old ones.
Complete mode outputs the entire result table every time, which can be costly for large data.
Update mode outputs only changed rows but requires sinks that support updates.
Output modes control how streaming results are shown or saved.
Append adds new rows only, complete shows all results, update shows changed rows.
Choose mode based on your data update needs and sink capabilities.