Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to create a Spark session in the cloud environment.
Apache Spark
from pyspark.sql import SparkSession spark = SparkSession.builder.appName([1]).getOrCreate()
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Forgetting to put quotes around the app name string.
Using parentheses which cause syntax errors.
✗ Incorrect
The appName method requires a string argument, so it must be enclosed in quotes.
2fill in blank
mediumComplete the code to read a CSV file from cloud storage into a Spark DataFrame.
Apache Spark
df = spark.read.format([1]).option("header", "true").load("s3a://bucket/data.csv")
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'json' or 'parquet' format for CSV files.
Forgetting to specify the format.
✗ Incorrect
To read CSV files, the format must be set to 'csv'.
3fill in blank
hardFix the error in the code to cache a DataFrame in Spark on the cloud.
Apache Spark
df.[1]() Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using incorrect method names like 'cached' or 'persisted'.
Trying to pass arguments to cache() which takes none.
✗ Incorrect
The correct method to cache a DataFrame is 'cache()'.
4fill in blank
hardFill both blanks to filter a DataFrame for rows where age is greater than 30 in cloud Spark.
Apache Spark
filtered_df = df.filter(df.[1] [2] 30)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using the wrong column name like 'name'.
Using '<' instead of '>' operator.
✗ Incorrect
We filter on the 'age' column with the '>' operator to get rows where age is greater than 30.
5fill in blank
hardFill all three blanks to create a dictionary comprehension that maps each word to its length if length is greater than 3 in Spark cloud code.
Apache Spark
lengths = { [1]: [2] for [3] in words if len([3]) > 3 } Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using different variable names inconsistently.
Not using len(word) as the dictionary value.
✗ Incorrect
We use 'word' as the loop variable, map 'word' to 'len(word)', and loop over 'word' in words.