0
0
GCPcloud~20 mins

Dataflow for stream/batch processing in GCP - Practice Problems & Coding Challenges

Choose your learning style9 modes available
Challenge - 5 Problems
🎖️
Dataflow Mastery Badge
Get all challenges correct to earn this badge!
Test your skills under time pressure!
service_behavior
intermediate
2:00remaining
Understanding Dataflow Windowing Behavior

You have a Dataflow pipeline processing streaming data with fixed windows of 5 minutes. What will happen if late data arrives after the window has closed and the allowed lateness period has passed?

AThe late data will be discarded and not processed in the pipeline.
BThe pipeline will reprocess the entire window including the late data.
CThe late data will be processed immediately in a new window.
DThe pipeline will throw an error and stop processing.
Attempts:
2 left
💡 Hint

Think about how Dataflow handles data that arrives after the window and allowed lateness period.

Architecture
intermediate
2:00remaining
Choosing Batch vs Streaming in Dataflow

You need to process a large dataset that updates once a day and produce a report. Which Dataflow processing mode is most appropriate?

ABatch mode, to process the entire dataset once daily.
BStreaming mode, to process data as it arrives continuously.
CStreaming mode with fixed windows of 1 day.
DBatch mode with micro-batches every few minutes.
Attempts:
2 left
💡 Hint

Consider how often the data updates and the best way to process large static datasets.

security
advanced
2:00remaining
Securing Dataflow Pipeline Access

You want to restrict who can start and manage your Dataflow jobs in your GCP project. Which IAM role should you assign to users to allow them to create and cancel Dataflow jobs but not modify other resources?

Aroles/viewer
Broles/dataflow.admin
Croles/dataflow.worker
Droles/dataflow.developer
Attempts:
2 left
💡 Hint

Think about the role that allows job management but limits broader admin permissions.

Configuration
advanced
2:00remaining
Configuring Autoscaling in Dataflow

You want your Dataflow streaming job to automatically adjust the number of worker instances based on workload. Which autoscaling algorithm should you choose for best responsiveness?

ACPU_BASED - scale workers based on CPU usage.
BNONE - disable autoscaling and set fixed workers.
CTHROUGHPUT_BASED - scale workers based on data throughput.
DMEMORY_BASED - scale workers based on memory usage.
Attempts:
2 left
💡 Hint

Consider which metric best reflects workload changes in streaming data.

Best Practice
expert
2:00remaining
Optimizing Dataflow Pipeline for Cost and Performance

You have a Dataflow batch pipeline that processes large files daily. You want to reduce cost without significantly increasing processing time. Which combination of strategies is best?

AUse autoscaling with minimum workers set to zero and disable streaming engine.
BUse autoscaling with a maximum number of workers and enable shuffle optimization.
CUse the smallest machine type and disable shuffle optimization to save cost.
DDisable autoscaling and set a high fixed number of workers to finish faster.
Attempts:
2 left
💡 Hint

Think about balancing worker count and pipeline optimizations.