0
0
Apache-sparkHow-ToBeginner ยท 3 min read

How to Use Spark UI in PySpark for Monitoring Jobs

To use Spark UI in PySpark, start your Spark session and open the web UI at http://localhost:4040 in your browser. This interface shows detailed information about running and completed Spark jobs, stages, and tasks.
๐Ÿ“

Syntax

The Spark UI is automatically available when you create a Spark session in PySpark. You do not need to write special code to enable it. The key parts are:

  • SparkSession: The entry point for Spark functionality.
  • spark.sparkContext.uiWebUrl: Property that holds the URL of the Spark UI.
  • Access the UI by opening the URL in a web browser, usually http://localhost:4040.
python
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('ExampleApp').getOrCreate()

print('Spark UI URL:', spark.sparkContext.uiWebUrl)
Output
Spark UI URL: http://localhost:4040
๐Ÿ’ป

Example

This example creates a Spark session, runs a simple job, and prints the Spark UI URL. You can open this URL in your browser to see job details like stages, tasks, and executors.

python
from pyspark.sql import SparkSession

# Create Spark session
spark = SparkSession.builder.appName('SparkUIExample').getOrCreate()

# Create a simple DataFrame
data = [(1, 'apple'), (2, 'banana'), (3, 'cherry')]
df = spark.createDataFrame(data, ['id', 'fruit'])

# Perform a transformation and action
filtered_df = df.filter(df.id > 1)
result = filtered_df.collect()

# Print the result
print('Filtered Data:', result)

# Print Spark UI URL
print('Access Spark UI at:', spark.sparkContext.uiWebUrl)

# Keep the session alive to view UI
input('Press Enter to stop Spark session...')

spark.stop()
Output
Filtered Data: [Row(id=2, fruit='banana'), Row(id=3, fruit='cherry')] Access Spark UI at: http://localhost:4040
โš ๏ธ

Common Pitfalls

Some common mistakes when using Spark UI in PySpark include:

  • Not keeping the Spark session alive, so the UI disappears immediately after the job finishes.
  • Trying to access the UI on a remote cluster without proper port forwarding or network access.
  • Assuming the UI is always at localhost:4040; if multiple Spark apps run, the port may increment (4041, 4042, etc.).

Always check spark.sparkContext.uiWebUrl to get the correct URL.

python
from pyspark.sql import SparkSession

# Wrong: Spark session stops immediately, UI closes
spark = SparkSession.builder.appName('WrongExample').getOrCreate()
spark.stop()

# Right: Keep session alive to view UI
spark = SparkSession.builder.appName('RightExample').getOrCreate()
input('Press Enter to stop Spark session...')
spark.stop()
๐Ÿ“Š

Quick Reference

Tips for using Spark UI in PySpark:

  • Access UI URL via spark.sparkContext.uiWebUrl.
  • Default port is 4040; if busy, ports increment.
  • Keep Spark session running to keep UI accessible.
  • Use UI tabs: Jobs, Stages, Storage, Environment, Executors for detailed info.
  • For remote clusters, use SSH port forwarding to access UI.
โœ…

Key Takeaways

Spark UI is automatically available when you start a Spark session in PySpark.
Access the UI URL via spark.sparkContext.uiWebUrl, usually at http://localhost:4040.
Keep your Spark session alive to keep the UI accessible for monitoring.
The UI shows detailed info about jobs, stages, tasks, and executors.
For remote clusters, use port forwarding to access the Spark UI.