Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to create a Spark session.
Apache Spark
from pyspark.sql import SparkSession spark = SparkSession.builder.appName([1]).getOrCreate()
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Forgetting to put the app name in quotes.
Using a variable name without quotes.
✗ Incorrect
The appName method expects a string with the application name, so it must be in quotes.
2fill in blank
mediumComplete the code to read a CSV file into a Spark DataFrame.
Apache Spark
df = spark.read.csv([1], header=True, inferSchema=True)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Not putting the file name in quotes.
Using a variable name without defining it.
✗ Incorrect
The file path must be a string, so it needs to be in quotes.
3fill in blank
hardFix the error in the code to select the 'age' column from the DataFrame.
Apache Spark
ages = df.select([1]) Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Passing the column name without quotes.
Using df.age inside select which is invalid.
✗ Incorrect
The select method expects the column name as a string in quotes.
4fill in blank
hardFill both blanks to filter rows where age is greater than 30 and select the 'name' column.
Apache Spark
filtered = df.filter(df.age [1] 30).select([2])
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using '<' instead of '>' for filtering.
Selecting column without quotes.
✗ Incorrect
Use '>' to filter ages greater than 30 and select the 'name' column as a string.
5fill in blank
hardFill all three blanks to create a dictionary with column names as keys and their data types as values.
Apache Spark
schema_dict = [1]: [2] for [3] in df.schema.fields}
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Not starting with '{' for dictionary comprehension.
Using wrong variable name in the loop.
Mixing up field.name and field.dataType.
✗ Incorrect
Use a dictionary comprehension starting with '{', iterate over 'field' in schema.fields, and map field.name to field.dataType.