beginner

What is a transformation in Apache Spark?

A transformation is an operation on a Spark dataset that returns a new dataset. It does not execute immediately but builds a plan for data processing.

Click to reveal answer

beginner

Why do transformations in Spark build processing pipelines instead of running immediately?

Transformations are lazy. They build a chain of steps (pipeline) to optimize execution and avoid unnecessary work until an action triggers computation.

Click to reveal answer

intermediate

How does lazy evaluation benefit Spark processing pipelines?

Lazy evaluation lets Spark combine multiple transformations into one optimized job, reducing data movement and speeding up processing.

Click to reveal answer

beginner

What triggers the execution of a Spark processing pipeline?

An action, like count() or collect(), triggers Spark to execute all the transformations in the pipeline and produce results.

Click to reveal answer

intermediate

Explain the role of transformations in building a Spark processing pipeline using a real-life example.

Imagine making a sandwich: each step (slice bread, add filling, cut) is like a transformation. You plan all steps first (pipeline) but only make the sandwich (execute) when hungry (action).

Click to reveal answer

What does a transformation in Spark do?

ACreates a new dataset without running immediately

BImmediately processes data and returns results

CDeletes data from the dataset

DSaves data to disk

When does Spark execute the transformations in a pipeline?

AAt the start of the program

BRight after each transformation

CWhen the cluster starts

DWhen an action is called

Why does Spark use lazy evaluation for transformations?

ATo optimize and combine steps before running

BTo slow down processing

CTo save memory by deleting data

DTo run transformations multiple times

Which of these is an example of an action in Spark?

AflatMap()

Bcount()

Cfilter()

Dmap()

What is the main benefit of building a processing pipeline with transformations?

AAutomatic data deletion

BImmediate data output

CImproved performance through optimization

DManual step-by-step execution

Describe how transformations build a processing pipeline in Spark and why this is useful.

Explain the difference between a transformation and an action in Spark with simple examples.