Recall & Review
beginner
What is the main goal of unit testing Spark transformations?
The main goal is to verify that each transformation on Spark DataFrames or RDDs produces the expected output for given input data, ensuring correctness before running on large datasets.
Click to reveal answer
beginner
Why do we use small sample data in unit tests for Spark transformations?
Using small sample data makes tests fast and easy to understand. It helps quickly check if the transformation logic works without processing large datasets.
Click to reveal answer
intermediate
Which Spark feature helps to compare expected and actual DataFrames in unit tests?
The 'collect()' method gathers data to the driver as a list, which can be compared with expected results. Also, libraries like 'assertDataFrameEqual' help compare DataFrames ignoring order.
Click to reveal answer
intermediate
How do you isolate a Spark transformation for unit testing?
You write the transformation as a pure function that takes a DataFrame as input and returns a transformed DataFrame. This way, you can test it independently from the rest of the pipeline.
Click to reveal answer
beginner
What is a common tool or framework used for unit testing Spark code in Python?
Pytest is commonly used for unit testing Spark code in Python. It allows writing simple test functions and integrates well with Spark testing utilities.
Click to reveal answer
What should a unit test for a Spark transformation focus on?
✗ Incorrect
Unit tests focus on verifying correctness with small, controlled inputs, not on performance or full dataset runs.
Which method is commonly used to bring Spark DataFrame data to the driver for comparison in tests?
✗ Incorrect
The collect() method returns all rows as a list to the driver, useful for comparing test results.
Why is it important to write Spark transformations as pure functions for testing?
✗ Incorrect
Pure functions have no side effects and depend only on inputs, making them easier to test independently.
Which Python testing framework is popular for Spark unit tests?
✗ Incorrect
Pytest is widely used in Python for writing simple and effective unit tests, including for Spark.
What is a good practice when creating test data for Spark transformation tests?
✗ Incorrect
Small, simple datasets help quickly verify transformation logic and make tests clear and fast.
Explain how you would write a unit test for a Spark DataFrame transformation.
Think about input, transformation, output, and verification steps.
You got /5 concepts.
Why is it important to isolate Spark transformations as pure functions for unit testing?
Consider how pure functions behave and why that helps testing.
You got /4 concepts.