Cost-based optimization in DBMS Theory - Time & Space Complexity
Start learning this pattern below
Jump into concepts and practice - no test required
Cost-based optimization helps a database decide the best way to run a query efficiently.
We want to know how the time to find the best plan grows as the query or data size grows.
Analyze the time complexity of this simplified cost-based optimizer process.
-- Pseudocode for cost-based optimization
FOR each possible query plan P
ESTIMATE cost of P
SELECT plan with lowest cost
This code tries all possible ways to run a query, estimates their costs, and picks the cheapest one.
Look for repeated steps that take most time.
- Primary operation: Checking each possible query plan.
- How many times: Number of plans grows with query complexity, often very fast.
As the query gets more parts (like joins), the number of plans to check grows quickly.
| Input Size (number of joins) | Approx. Plans Checked |
|---|---|
| 2 | 2 |
| 4 | ~20 |
| 6 | ~400 |
Pattern observation: The number of plans grows very fast, much faster than just doubling.
Time Complexity: O(n!)
This means the time to find the best plan grows extremely fast as the query size increases.
[X] Wrong: "The optimizer checks only a few plans, so it runs quickly for any query."
[OK] Correct: In reality, the number of plans can explode with query size, making optimization costly.
Understanding how optimization time grows helps you appreciate database design and query tuning skills.
What if the optimizer used heuristics to skip some plans? How would the time complexity change?
Practice
Solution
Step 1: Understand the purpose of cost-based optimization
Cost-based optimization evaluates different ways to run a query and estimates their costs.Step 2: Identify the main goal
The goal is to pick the plan with the lowest cost, meaning the fastest or least resource-heavy execution.Final Answer:
To find the most efficient way to execute a query -> Option AQuick Check:
Cost-based optimization = efficient query execution [OK]
- Confusing optimization with data storage
- Thinking it handles security tasks
- Assuming it manages backups
Solution
Step 1: Identify what cost-based optimizers use
They rely on data statistics such as table size, number of rows, and indexes to estimate costs.Step 2: Match the correct input
Data statistics directly affect the cost estimation, unlike user credentials or backup schedules.Final Answer:
Data statistics like table size and index availability -> Option BQuick Check:
Cost estimation uses data statistics [OK]
- Confusing user data with statistics
- Thinking network or backups affect cost estimation
- Ignoring the role of indexes
Solution
Step 1: Understand cost comparison
The optimizer picks the plan with the lowest estimated cost to improve performance.Step 2: Compare given costs
Plan A costs 50 units, which is less than Plan B's 80 units, so Plan A is preferred.Final Answer:
Plan A because it has a lower cost -> Option DQuick Check:
Lower cost plan chosen = Plan A [OK]
- Picking higher cost plan mistakenly
- Ignoring cost values
- Assuming cost is irrelevant
Solution
Step 1: Identify factors affecting optimizer decisions
The optimizer depends on accurate data statistics to estimate costs correctly.Step 2: Analyze the problem cause
If statistics are outdated, the optimizer may pick a suboptimal plan, causing slower queries.Final Answer:
The data statistics are outdated or inaccurate -> Option AQuick Check:
Outdated stats cause wrong plan choice [OK]
- Blaming server status without checking stats
- Confusing syntax errors with optimization issues
- Assuming permissions affect plan choice
Solution
Step 1: Understand index selection by cost-based optimizer
The optimizer calculates the cost of using each index based on statistics like selectivity and size.Step 2: Apply cost comparison to index choice
It chooses the index that results in the lowest estimated cost for the query execution.Final Answer:
It estimates the cost of using each index and picks the cheaper one -> Option CQuick Check:
Index choice based on cost estimation [OK]
- Assuming fixed index usage without cost check
- Thinking optimizer ignores indexes
- Believing it uses multiple indexes without cost analysis
