0
0
Apache Sparkdata~10 mins

Spark UI for debugging performance in Apache Spark - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to start a Spark session with the UI enabled.

Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('TestApp').[1]()
Drag options to blanks, or click blank then click option'
AlaunchUI
BenableUI
CgetOrCreate
DstartUI
Attempts:
3 left
💡 Hint
Common Mistakes
Using a method that does not exist like enableUI or startUI.
Forgetting to call a method to create the session.
2fill in blank
medium

Complete the code to access the Spark UI web URL from the SparkContext.

Apache Spark
ui_url = spark.sparkContext.[1]
Drag options to blanks, or click blank then click option'
AuiWebUrl()
BuiWebUrl
CgetUIUrl()
DgetUIWebUrl
Attempts:
3 left
💡 Hint
Common Mistakes
Calling uiWebUrl as a method with parentheses.
Using incorrect method names like getUIUrl.
3fill in blank
hard

Fix the error in the code to get the duration of the last Spark job from the Spark UI listener.

Apache Spark
listener = spark.sparkContext.statusTracker
last_job = listener.getJobIdsForGroup([1])[-1]
duration = listener.getJobInfo(last_job).[2]
Drag options to blanks, or click blank then click option'
A0, 'completionTime'
B0, 'duration'
CNone, 'completionTime'
DNone, 'duration'
Attempts:
3 left
💡 Hint
Common Mistakes
Using None instead of 0 for the job group.
Using 'duration' which is not a valid property.
4fill in blank
hard

Fill both blanks to create a dictionary of stage IDs and their durations from the Spark UI listener.

Apache Spark
stage_info = {stage.stageId: stage.[1] for stage in spark.sparkContext.statusTracker.[2]()}
Drag options to blanks, or click blank then click option'
AcompletionTime
BgetActiveStages
CgetAllStages
Dduration
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'duration' instead of 'completionTime'.
Using 'getActiveStages' which returns only active stages.
5fill in blank
hard

Fill all three blanks to filter stages with duration greater than 1000 ms and create a dictionary of their IDs and durations.

Apache Spark
filtered_stages = {stage.[1]: stage.[2] for stage in spark.sparkContext.statusTracker.[3]() if stage.completionTime and stage.completionTime > 1000}
Drag options to blanks, or click blank then click option'
AstageId
BcompletionTime
CgetAllStages
DgetActiveStages
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'getActiveStages' which misses completed stages.
Using wrong property names like 'duration'.