Recall & Review
beginner
What is a column expression in Apache Spark?
A column expression is a way to refer to a column in a DataFrame to perform operations like selecting, filtering, or transforming data.
Click to reveal answer
beginner
How do you add two columns in a Spark DataFrame?
You can add two columns using the '+' operator on column expressions, for example: df.withColumn('sum', df['col1'] + df['col2']).
Click to reveal answer
beginner
What does the function 'lit()' do in Spark column expressions?
The 'lit()' function creates a column of a literal (constant) value, which can be used in expressions with other columns.
Click to reveal answer
intermediate
Explain the use of 'when' and 'otherwise' functions in Spark.
'when' allows conditional expressions on columns, similar to IF statements. 'otherwise' defines the value if the condition is false.
Click to reveal answer
intermediate
How can you chain multiple column functions in Spark?
You can chain functions by applying one after another on column expressions, for example: df.withColumn('new', col('a').cast('int').alias('new')).
Click to reveal answer
Which function creates a constant column in Spark?
✗ Incorrect
The lit() function creates a column with a constant value.
How do you refer to a column named 'age' in Spark DataFrame?
✗ Incorrect
df['age'] or col('age') refers to the column named 'age'.
What does the 'when' function do in Spark?
✗ Incorrect
The when function is used for conditional logic in column expressions.
Which operator is used to add two columns in Spark?
✗ Incorrect
The '+' operator adds two column expressions.
How do you handle the 'else' part of a conditional expression in Spark?
✗ Incorrect
The otherwise() function defines the value if the when condition is false.
Describe how to create a new column in a Spark DataFrame using column expressions and functions.
Think about how you can combine columns and constants to make a new column.
You got /3 concepts.
Explain how conditional logic is implemented in Spark column expressions.
Consider how you choose values based on conditions in a DataFrame.
You got /4 concepts.