How to Create Temp View in PySpark: Simple Guide
In PySpark, you create a temporary view from a DataFrame using
createOrReplaceTempView. This lets you run SQL queries on the DataFrame using Spark SQL syntax. The temp view exists only during the Spark session.Syntax
The syntax to create a temporary view in PySpark is simple:
dataframe.createOrReplaceTempView(view_name)
Here, dataframe is your Spark DataFrame, and view_name is the name you want to give to the temporary view. This view can then be queried using Spark SQL.
python
dataframe.createOrReplaceTempView("view_name")Example
This example shows how to create a temporary view from a DataFrame and run a SQL query on it.
python
from pyspark.sql import SparkSession # Start Spark session spark = SparkSession.builder.appName("TempViewExample").getOrCreate() # Create sample data data = [(1, "Alice"), (2, "Bob"), (3, "Cathy")] columns = ["id", "name"] # Create DataFrame df = spark.createDataFrame(data, columns) # Create temp view df.createOrReplaceTempView("people") # Run SQL query on temp view result = spark.sql("SELECT * FROM people WHERE id > 1") # Show result result.show()
Output
+---+-----+
| id| name|
+---+-----+
| 2| Bob|
| 3|Cathy|
+---+-----+
Common Pitfalls
Common mistakes when creating temp views include:
- Trying to use the temp view after the Spark session ends (temp views only last for the session).
- Using
createTempViewinstead ofcreateOrReplaceTempViewwhich fails if the view already exists. - Not calling
createOrReplaceTempViewon a DataFrame before running SQL queries.
python
from pyspark.sql import SparkSession spark = SparkSession.builder.appName("PitfallExample").getOrCreate() data = [(1, "Alice")] columns = ["id", "name"] df = spark.createDataFrame(data, columns) # Wrong: Using createTempView twice causes error if view exists # df.createTempView("people") # df.createTempView("people") # This will raise an error # Right: Use createOrReplaceTempView to avoid error df.createOrReplaceTempView("people") df.createOrReplaceTempView("people") # This replaces the view safely
Quick Reference
Summary tips for creating temp views in PySpark:
- Use
createOrReplaceTempViewto create or update a temp view. - Temp views last only during the Spark session.
- Query temp views using
spark.sql("SELECT ..."). - Temp views are useful for running SQL on DataFrames without saving data.
Key Takeaways
Use createOrReplaceTempView on a DataFrame to create a temporary SQL view.
Temporary views exist only during the Spark session and disappear after it ends.
Run SQL queries on temp views using spark.sql with the view name.
Avoid createTempView if you want to replace an existing view without errors.
Temp views let you use SQL syntax on DataFrames without saving data permanently.