0
0
Apache Sparkdata~10 mins

Why data quality prevents downstream failures in Apache Spark - Test Your Understanding

Choose your learning style9 modes available
Practice - 5 Tasks
Answer the questions below
1fill in blank
easy

Complete the code to read a CSV file into a Spark DataFrame.

Apache Spark
df = spark.read.[1]("data.csv")
Drag options to blanks, or click blank then click option'
Atext
Bcsv
Cparquet
Djson
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'json' or 'parquet' to read a CSV file causes errors.
Using 'text' reads lines as strings, not structured data.
2fill in blank
medium

Complete the code to drop rows with any null values.

Apache Spark
clean_df = df.[1]()
Drag options to blanks, or click blank then click option'
Adropna
Bna.drop
Cfillna
DdropDuplicates
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'fillna' fills nulls but does not remove rows.
Using 'dropDuplicates' removes duplicate rows, not nulls.
3fill in blank
hard

Fix the error in the code to filter rows where 'age' is greater than 18.

Apache Spark
adults = df.filter(df.age [1] 18)
Drag options to blanks, or click blank then click option'
A==
B<
C>
D<=
Attempts:
3 left
💡 Hint
Common Mistakes
Using '<' selects younger people, not adults.
Using '==' selects only age exactly 18.
4fill in blank
hard

Fill both blanks to create a dictionary of word lengths for words longer than 3 characters.

Apache Spark
lengths = {word: [1] for word in words if len(word) [2] 3}
Drag options to blanks, or click blank then click option'
Alen(word)
B>
C<
Dword
Attempts:
3 left
💡 Hint
Common Mistakes
Using '<' selects words shorter than 3 characters.
Using 'word' as value stores the word, not its length.
5fill in blank
hard

Fill all three blanks to create a filtered dictionary with uppercase keys and values greater than 0.

Apache Spark
result = [1]: [2] for k, v in data.items() if v [3] 0}
Drag options to blanks, or click blank then click option'
Ak.upper()
Bv
C>
Dk
Attempts:
3 left
💡 Hint
Common Mistakes
Using 'k' instead of 'k.upper()' keeps keys lowercase.
Using '<' or '==' filters wrong values.