Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to start a Spark session with the UI enabled.
Apache Spark
from pyspark.sql import SparkSession spark = SparkSession.builder.appName('TestApp').[1]()
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using a method that does not exist like enableUI or startUI.
Forgetting to call a method to create the session.
✗ Incorrect
The method getOrCreate() starts the Spark session and enables the UI by default.
2fill in blank
mediumComplete the code to access the Spark UI web URL from the SparkContext.
Apache Spark
ui_url = spark.sparkContext.[1] Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Calling uiWebUrl as a method with parentheses.
Using incorrect method names like getUIUrl.
✗ Incorrect
The uiWebUrl property gives the Spark UI web address as a string.
3fill in blank
hardFix the error in the code to get the duration of the last Spark job from the Spark UI listener.
Apache Spark
listener = spark.sparkContext.statusTracker last_job = listener.getJobIdsForGroup([1])[-1] duration = listener.getJobInfo(last_job).[2]
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using None instead of 0 for the job group.
Using 'duration' which is not a valid property.
✗ Incorrect
The first argument is 0 to get the first job group, and 'completionTime' is the correct attribute for duration.
4fill in blank
hardFill both blanks to create a dictionary of stage IDs and their durations from the Spark UI listener.
Apache Spark
stage_info = {stage.stageId: stage.[1] for stage in spark.sparkContext.statusTracker.[2]()} Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'duration' instead of 'completionTime'.
Using 'getActiveStages' which returns only active stages.
✗ Incorrect
Use 'completionTime' for stage duration and 'getAllStages' to get all stages.
5fill in blank
hardFill all three blanks to filter stages with duration greater than 1000 ms and create a dictionary of their IDs and durations.
Apache Spark
filtered_stages = {stage.[1]: stage.[2] for stage in spark.sparkContext.statusTracker.[3]() if stage.completionTime and stage.completionTime > 1000} Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'getActiveStages' which misses completed stages.
Using wrong property names like 'duration'.
✗ Incorrect
Use 'stageId' and 'completionTime' properties and 'getAllStages()' method to filter stages by duration.