Why data sources query existing infrastructure in Terraform - Performance Analysis
When Terraform uses data sources, it asks the cloud for information about resources already there.
We want to know how the time to get this information changes as we ask about more resources.
Analyze the time complexity of the following operation sequence.
data "aws_instance" "example" {
filter {
name = "tag:Name"
values = ["web-server"]
}
}
output "instance_id" {
value = data.aws_instance.example.id
}
This code asks AWS to find an existing instance with a specific tag and returns its ID.
Identify the API calls, resource provisioning, data transfers that repeat.
- Primary operation: Querying the cloud provider's API to find matching resources.
- How many times: Once per data source block in the configuration.
As you add more data sources querying different resources, the number of API calls grows.
| Input Size (n) | Approx. API Calls/Operations |
|---|---|
| 10 | 10 API calls |
| 100 | 100 API calls |
| 1000 | 1000 API calls |
Pattern observation: Each new data source adds one more API call, so the total grows directly with the number of data sources.
Time Complexity: O(n)
This means the time to query existing infrastructure grows linearly with how many data sources you use.
[X] Wrong: "Data sources run once and then don't add to execution time no matter how many are used."
[OK] Correct: Each data source makes its own call to the cloud, so more data sources mean more calls and more time.
Understanding how querying existing resources scales helps you design efficient infrastructure code and shows you think about real-world cloud costs and delays.
"What if multiple data sources queried the same resource? How would the time complexity change?"