0
0
Apache Sparkdata~10 mins

Integration testing pipelines in Apache Spark - Interactive Code Practice

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to create a SparkSession for testing.

Apache Spark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName([1]).getOrCreate()
Drag options to blanks, or click blank then click option'
A"DataPipeline"
B"ProductionApp"
C"TestApp"
D"MainApp"
Attempts:
3 left
💡 Hint
Common Mistakes
Using production app names in test code.
Not setting an app name at all.
2fill in blank
medium

Complete the code to read a CSV file into a DataFrame for testing.

Apache Spark
df = spark.read.format([1]).option("header", "true").load("test_data.csv")
Drag options to blanks, or click blank then click option'
A"csv"
B"text"
C"parquet"
D"json"
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'json' or 'parquet' format for CSV files.
Forgetting to set header option to true.
3fill in blank
hard

Fix the error in the assertion that checks row count in the test DataFrame.

Apache Spark
assert df.count() [1] 100, "Row count should be 100"
Drag options to blanks, or click blank then click option'
A!=
B<
C>
D==
Attempts:
3 left
💡 Hint
Common Mistakes
Using '!=' which checks for inequality.
Using '<' or '>' which check for less or more rows.
4fill in blank
hard

Fill both blanks to filter the DataFrame for rows where 'age' is greater than 30.

Apache Spark
filtered_df = df.filter(df.[1] [2] 30)
Drag options to blanks, or click blank then click option'
Aage
B>
C<
Dsalary
Attempts:
3 left
💡 Hint
Common Mistakes
Using the wrong column name like 'salary'.
Using '<' instead of '>' operator.
5fill in blank
hard

Fill all three blanks to create a dictionary comprehension that maps column names to their data types for columns with type 'string'.

Apache Spark
string_cols = { [1]: df.schema[[2]].dataType.simpleString() for [3] in df.columns if df.schema[[3]].dataType.simpleString() == "string" }
Drag options to blanks, or click blank then click option'
Acol
Didx
Attempts:
3 left
💡 Hint
Common Mistakes
Using different variable names inconsistently.
Using an index variable instead of column names.