Challenge - 5 Problems

🎖️

Data Pipeline Mastery

Get all challenges correct to earn this badge!

Test your skills under time pressure!

❓ Architecture

intermediate

2:00remaining

Identify the correct data pipeline pattern for batch processing

You want to process large amounts of data collected over a day and run analytics once daily. Which data pipeline pattern fits best?

AStreaming pipeline that processes data in real-time as it arrives.

BLambda architecture combining batch and streaming for fault tolerance.

CBatch pipeline that processes data in scheduled chunks, like daily batches.

DMicro-batch pipeline that processes data every few seconds.

Attempts:

2 left

❓ service_behavior

intermediate

2:00remaining

What happens when a streaming pipeline in GCP Pub/Sub loses a message?

In a streaming pipeline using Google Cloud Pub/Sub, if a message is not acknowledged by the subscriber, what is the expected behavior?

AThe message is deleted immediately and lost permanently.

BThe message is retried and redelivered until acknowledged or expires.

CThe message is sent to a dead-letter queue instantly.

DThe message is duplicated and sent to all subscribers.

Attempts:

2 left

❓ security

advanced

2:30remaining

Secure data transfer in a multi-stage pipeline

You have a multi-stage data pipeline in GCP involving Cloud Storage, Dataflow, and BigQuery. Which practice best secures data in transit between these services?

AEnable VPC Service Controls and use private IPs for communication.

BUse FTP protocol for faster data transfer.

CEncrypt data only at rest, not during transfer.

DUse public internet with IP whitelisting for data transfer.

Attempts:

2 left

🧠 Conceptual

advanced

2:30remaining

Choosing the right pipeline pattern for low-latency analytics

You need to analyze data with minimal delay after it is generated. Which data pipeline pattern is most suitable?

AStreaming processing with event-driven triggers.

BBatch processing with daily scheduled jobs.

CMicro-batch processing with hourly intervals.

DManual data export and import for analysis.

Attempts:

2 left

✅ Best Practice

expert

3:00remaining

Optimizing cost and performance in a hybrid data pipeline

You have a hybrid data pipeline combining batch and streaming in GCP. What is the best practice to optimize cost without sacrificing performance?

ARun streaming jobs continuously on large instances regardless of load.

BUse only batch processing to avoid streaming costs.

CSchedule batch jobs during peak hours and disable autoscaling.

DUse autoscaling for streaming jobs and schedule batch jobs during off-peak hours.

Attempts:

2 left