How to Use Spark UI in PySpark for Monitoring Jobs
To use
Spark UI in PySpark, start your Spark session and open the web UI at http://localhost:4040 in your browser. This interface shows detailed information about running and completed Spark jobs, stages, and tasks.Syntax
The Spark UI is automatically available when you create a Spark session in PySpark. You do not need to write special code to enable it. The key parts are:
SparkSession: The entry point for Spark functionality.spark.sparkContext.uiWebUrl: Property that holds the URL of the Spark UI.- Access the UI by opening the URL in a web browser, usually
http://localhost:4040.
python
from pyspark.sql import SparkSession spark = SparkSession.builder.appName('ExampleApp').getOrCreate() print('Spark UI URL:', spark.sparkContext.uiWebUrl)
Output
Spark UI URL: http://localhost:4040
Example
This example creates a Spark session, runs a simple job, and prints the Spark UI URL. You can open this URL in your browser to see job details like stages, tasks, and executors.
python
from pyspark.sql import SparkSession # Create Spark session spark = SparkSession.builder.appName('SparkUIExample').getOrCreate() # Create a simple DataFrame data = [(1, 'apple'), (2, 'banana'), (3, 'cherry')] df = spark.createDataFrame(data, ['id', 'fruit']) # Perform a transformation and action filtered_df = df.filter(df.id > 1) result = filtered_df.collect() # Print the result print('Filtered Data:', result) # Print Spark UI URL print('Access Spark UI at:', spark.sparkContext.uiWebUrl) # Keep the session alive to view UI input('Press Enter to stop Spark session...') spark.stop()
Output
Filtered Data: [Row(id=2, fruit='banana'), Row(id=3, fruit='cherry')]
Access Spark UI at: http://localhost:4040
Common Pitfalls
Some common mistakes when using Spark UI in PySpark include:
- Not keeping the Spark session alive, so the UI disappears immediately after the job finishes.
- Trying to access the UI on a remote cluster without proper port forwarding or network access.
- Assuming the UI is always at
localhost:4040; if multiple Spark apps run, the port may increment (4041, 4042, etc.).
Always check spark.sparkContext.uiWebUrl to get the correct URL.
python
from pyspark.sql import SparkSession # Wrong: Spark session stops immediately, UI closes spark = SparkSession.builder.appName('WrongExample').getOrCreate() spark.stop() # Right: Keep session alive to view UI spark = SparkSession.builder.appName('RightExample').getOrCreate() input('Press Enter to stop Spark session...') spark.stop()
Quick Reference
Tips for using Spark UI in PySpark:
- Access UI URL via
spark.sparkContext.uiWebUrl. - Default port is 4040; if busy, ports increment.
- Keep Spark session running to keep UI accessible.
- Use UI tabs: Jobs, Stages, Storage, Environment, Executors for detailed info.
- For remote clusters, use SSH port forwarding to access UI.
Key Takeaways
Spark UI is automatically available when you start a Spark session in PySpark.
Access the UI URL via spark.sparkContext.uiWebUrl, usually at http://localhost:4040.
Keep your Spark session alive to keep the UI accessible for monitoring.
The UI shows detailed info about jobs, stages, tasks, and executors.
For remote clusters, use port forwarding to access the Spark UI.