GCPcloud~5 mins

BigQuery SQL and pricing model in GCP - Time & Space Complexity

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Time Complexity: BigQuery SQL and pricing model

O(n)

Understanding Time Complexity

We want to understand how the cost and time of running BigQuery SQL queries grow as the amount of data increases.

Specifically, how does scanning more data affect the number of operations and pricing?

Scenario Under Consideration

Analyze the time complexity of running a simple SELECT query scanning a large table.


SELECT user_id, COUNT(*) AS visits
FROM `project.dataset.web_logs`
WHERE event_date BETWEEN '2024-01-01' AND '2024-01-31'
GROUP BY user_id
ORDER BY visits DESC
LIMIT 10

This query scans web log data for one month, groups by user, and returns the top 10 visitors.

Identify Repeating Operations

Look at what happens repeatedly during query execution.

Primary operation: Scanning data blocks from storage.
How many times: Once for each data block covering the date range scanned.
Secondary operations: Grouping and sorting happen after scanning but depend on scanned data size.

How Execution Grows With Input

As the amount of data scanned grows, the number of data blocks read grows roughly in proportion.

Input Size (GB scanned)	Approx. Data Blocks Read
10	10 blocks
100	100 blocks
1000	1000 blocks

Pattern observation: The scanning cost grows linearly with the amount of data scanned.

Final Time Complexity

Time Complexity: O(n)

This means the time and cost increase directly in proportion to the size of the data scanned.

Common Mistake

[X] Wrong: "Query cost stays the same no matter how much data is scanned."

[OK] Correct: BigQuery charges and time depend on how much data the query reads, so scanning more data costs more time and money.

Interview Connect

Understanding how query cost grows with data size helps you design efficient queries and manage cloud costs wisely.

Self-Check

What if we added a filter that reduces scanned data by half? How would the time complexity and cost change?