Storage transfer service in GCP - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
When moving data between storage locations, it is important to understand how the time needed grows as the amount of data increases.
We want to know how the number of transfer operations changes as we move more files or larger data.
Analyze the time complexity of the following operation sequence.
transferJob = {
'projectId': 'my-project',
'transferSpec': {
'gcsDataSource': {'bucketName': 'source-bucket'},
'gcsDataSink': {'bucketName': 'destination-bucket'}
},
'schedule': {'scheduleStartDate': {'year': 2024, 'month': 6, 'day': 1}}
}
createTransferJob(transferJob)
startTransferJob(transferJob)
This sequence creates and starts a transfer job that moves data from one cloud storage bucket to another.
Identify the API calls, resource provisioning, data transfers that repeat.
- Primary operation: Transferring each file or data chunk from source to destination.
- How many times: Once per file or data chunk in the source bucket.
As the number of files or total data size increases, the number of transfer operations grows roughly in direct proportion.
| Input Size (n) | Approx. Api Calls/Operations |
|---|---|
| 10 files | About 10 transfer operations |
| 100 files | About 100 transfer operations |
| 1000 files | About 1000 transfer operations |
Pattern observation: The number of operations grows linearly with the number of files or data chunks.
Time Complexity: O(n)
This means the time to complete the transfer grows directly with the amount of data to move.
[X] Wrong: "Starting one transfer job moves all files instantly regardless of size."
[OK] Correct: Each file or data chunk must be transferred individually, so more data means more work and time.
Understanding how data transfer scales helps you design efficient cloud solutions and explain your reasoning clearly in discussions.
"What if we changed the transfer to move only changed files instead of all files? How would the time complexity change?"
Practice
Solution
Step 1: Understand the service function
Storage Transfer Service is designed to move or copy data between storage locations like on-premises, AWS S3, or Google Cloud Storage.Step 2: Eliminate unrelated options
Options B, C, and D describe different services unrelated to data transfer.Final Answer:
To move or copy data between different storage locations automatically -> Option AQuick Check:
Storage Transfer Service = Data movement [OK]
- Confusing transfer service with backup or monitoring tools
- Thinking it manages user permissions
- Assuming it only works within Google Cloud
Solution
Step 1: Identify valid source types
Storage Transfer Service supports sources like Google Cloud Storage buckets, AWS S3 buckets, or on-premises data.Step 2: Match correct JSON syntax for GCS source
The correct syntax uses "gcsDataSource" with a "bucketName" field, as shown in "source": {"gcsDataSource": {"bucketName": "my-source-bucket"}}.Final Answer:
"source": {"gcsDataSource": {"bucketName": "my-source-bucket"}} -> Option CQuick Check:
Source config for GCS = "source": {"gcsDataSource": {"bucketName": "my-source-bucket"}} [OK]
- Using unsupported source types like VM or SQL database
- Incorrect JSON structure for source
- Confusing source with destination fields
{"schedule": {"scheduleStartDate": {"year": 2024, "month": 6, "day": 10}, "startTimeOfDay": {"hours": 3, "minutes": 0}}}When will the transfer job start?
Solution
Step 1: Read the scheduleStartDate and startTimeOfDay
The date is June 10, 2024, and the time is 3 hours and 0 minutes, which means 3:00 AM.Step 2: Confirm time format
The time is in 24-hour format, so 3 means 3 AM, not PM.Final Answer:
At 3:00 AM on June 10, 2024 -> Option AQuick Check:
3 hours = 3 AM, date matches [OK]
- Mistaking 3 for 3 PM instead of 3 AM
- Ignoring the date and assuming current day
- Confusing startTimeOfDay with duration
{"transferJob": {"status": "ENABLED", "schedule": {"scheduleStartDate": {"year": 2024, "month": 7, "day": 20}, "startTimeOfDay": {"hours": 25, "minutes": 0}}}}What is the problem?
Solution
Step 1: Check startTimeOfDay values
The hours field is set to 25, which is invalid because valid hours range from 0 to 23.Step 2: Validate other fields
The scheduleStartDate is a future date, status is ENABLED which is correct, and minutes is 0 which is valid.Final Answer:
The startTimeOfDay hours value is invalid; it must be between 0 and 23 -> Option DQuick Check:
Hours must be 0-23, 25 is invalid [OK]
- Assuming status DISABLED starts the job
- Thinking minutes must be 30 or 60
- Ignoring invalid hour value
Solution
Step 1: Identify source and destination
The source is AWS S3 bucket, which requires access keys for authentication. The destination is a Google Cloud Storage bucket.Step 2: Set schedule for daily transfers
To transfer data daily, the schedule must be configured to run every day.Final Answer:
Set AWS S3 as source with access keys, GCS bucket as destination, and schedule daily -> Option BQuick Check:
AWS S3 source + credentials + daily schedule = Set AWS S3 as source with access keys, GCS bucket as destination, and schedule daily [OK]
- Forgetting AWS credentials
- Reversing source and destination
- Not setting a schedule for repeated transfers
