0
0
Apache Sparkdata~10 mins

Why optimization prevents job failures in Apache Spark - Test Your Understanding

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to cache the DataFrame to optimize performance.

Apache Spark
df = spark.read.csv('data.csv')
df.[1]()
Drag options to blanks, or click blank then click option'
Ashow
Bcollect
Ccache
Dcount
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'collect' which brings data to the driver and can cause memory errors.
Using 'show' which only displays data but does not optimize.
Using 'count' which triggers computation but does not cache.
2fill in blank
medium

Complete the code to repartition the DataFrame for better parallelism.

Apache Spark
df = df.[1](10)
Drag options to blanks, or click blank then click option'
Arepartition
Bpersist
Ccollect
Dcache
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'cache' which stores data but does not change partitions.
Using 'collect' which brings data to the driver and can cause failures.
Using 'persist' which is similar to cache but does not repartition.
3fill in blank
hard

Fix the error in the code to avoid job failure by using the correct action.

Apache Spark
result = df.filter(df.age > 30)
result.[1]()
Drag options to blanks, or click blank then click option'
Apersist
Bshow
Ccache
Dmap
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'cache' or 'persist' without an action, so job does not run.
Using 'map' which is not an action and causes errors.
4fill in blank
hard

Fill both blanks to create a dictionary of word lengths for words longer than 3 characters.

Apache Spark
lengths = {word: [1] for word in words if len(word) [2] 3}
Drag options to blanks, or click blank then click option'
Alen(word)
B>
C<
Dword
Attempts:
3 left
💡 Hint
Common Mistakes
Using '<' instead of '>' causing wrong filtering.
Using 'word' as value instead of length.
5fill in blank
hard

Fill all three blanks to create a filtered dictionary with uppercase keys for words longer than 4 characters.

Apache Spark
result = [1]([2]: [3] for [2] in words if len([2]) > 4)
Drag options to blanks, or click blank then click option'
Adict
Bword
Cword.upper()
Dlist
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'list' instead of 'dict' causing type errors.
Using 'word.upper()' as key instead of 'word'.
Using inconsistent variable names.