Challenge - 5 Problems

🎖️

Dataflow Mastery Badge

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ service_behavior

intermediate

2:00remaining

Understanding Dataflow Windowing Behavior

You have a Dataflow pipeline processing streaming data with fixed windows of 5 minutes. What will happen if late data arrives after the window has closed and the allowed lateness period has passed?

AThe late data will be discarded and not processed in the pipeline.

BThe pipeline will reprocess the entire window including the late data.

CThe late data will be processed immediately in a new window.

DThe pipeline will throw an error and stop processing.

Attempts:

2 left

❓ Architecture

intermediate

2:00remaining

Choosing Batch vs Streaming in Dataflow

You need to process a large dataset that updates once a day and produce a report. Which Dataflow processing mode is most appropriate?

ABatch mode, to process the entire dataset once daily.

BStreaming mode, to process data as it arrives continuously.

CStreaming mode with fixed windows of 1 day.

DBatch mode with micro-batches every few minutes.

Attempts:

2 left

❓ security

advanced

2:00remaining

Securing Dataflow Pipeline Access

You want to restrict who can start and manage your Dataflow jobs in your GCP project. Which IAM role should you assign to users to allow them to create and cancel Dataflow jobs but not modify other resources?

Aroles/viewer

Broles/dataflow.admin

Croles/dataflow.worker

Droles/dataflow.developer

Attempts:

2 left

❓ Configuration

advanced

2:00remaining

Configuring Autoscaling in Dataflow

You want your Dataflow streaming job to automatically adjust the number of worker instances based on workload. Which autoscaling algorithm should you choose for best responsiveness?

ACPU_BASED - scale workers based on CPU usage.

BNONE - disable autoscaling and set fixed workers.

CTHROUGHPUT_BASED - scale workers based on data throughput.

DMEMORY_BASED - scale workers based on memory usage.

Attempts:

2 left

✅ Best Practice

expert

2:00remaining

Optimizing Dataflow Pipeline for Cost and Performance

You have a Dataflow batch pipeline that processes large files daily. You want to reduce cost without significantly increasing processing time. Which combination of strategies is best?

AUse autoscaling with minimum workers set to zero and disable streaming engine.

BUse autoscaling with a maximum number of workers and enable shuffle optimization.

CUse the smallest machine type and disable shuffle optimization to save cost.

DDisable autoscaling and set a high fixed number of workers to finish faster.

Attempts:

2 left