Practice - 5 Tasks
Answer the questions below
1fill in blank
easyComplete the code to create a SparkSession for testing.
Apache Spark
from pyspark.sql import SparkSession spark = SparkSession.builder.appName([1]).getOrCreate()
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using production app names in test code.
Not setting an app name at all.
✗ Incorrect
The app name 'TestApp' clearly indicates this SparkSession is for testing purposes.
2fill in blank
mediumComplete the code to read a CSV file into a DataFrame for testing.
Apache Spark
df = spark.read.format([1]).option("header", "true").load("test_data.csv")
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'json' or 'parquet' format for CSV files.
Forgetting to set header option to true.
✗ Incorrect
The data file is a CSV, so the format must be 'csv' to read it correctly.
3fill in blank
hardFix the error in the assertion that checks row count in the test DataFrame.
Apache Spark
assert df.count() [1] 100, "Row count should be 100"
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using '!=' which checks for inequality.
Using '<' or '>' which check for less or more rows.
✗ Incorrect
To verify the DataFrame has exactly 100 rows, use '==' in the assertion.
4fill in blank
hardFill both blanks to filter the DataFrame for rows where 'age' is greater than 30.
Apache Spark
filtered_df = df.filter(df.[1] [2] 30)
Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using the wrong column name like 'salary'.
Using '<' instead of '>' operator.
✗ Incorrect
To filter rows where age is greater than 30, use df.age > 30.
5fill in blank
hardFill all three blanks to create a dictionary comprehension that maps column names to their data types for columns with type 'string'.
Apache Spark
string_cols = { [1]: df.schema[[2]].dataType.simpleString() for [3] in df.columns if df.schema[[3]].dataType.simpleString() == "string" } Drag options to blanks, or click blank then click option'
Attempts:
3 left
💡 Hint
Common Mistakes
Using different variable names inconsistently.
Using an index variable instead of column names.
✗ Incorrect
Use 'col' as the variable name consistently to map column names to their data types.