Pandas with SQL databases - Time & Space Complexity
When using pandas to work with SQL databases, it is important to understand how the time taken grows as data size increases.
We want to know how the cost of reading and writing data changes when the database or query size changes.
Analyze the time complexity of the following code snippet.
import pandas as pd
import sqlite3
conn = sqlite3.connect('example.db')
df = pd.read_sql_query('SELECT * FROM sales WHERE amount > 100', conn)
conn.close()
This code connects to a SQL database, runs a query to get rows where amount is greater than 100, and loads the result into a pandas DataFrame.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: The database scans rows to find those matching the condition.
- How many times: Once for each row in the table until all are checked or filtered.
As the number of rows in the database grows, the time to scan and filter grows roughly in proportion.
| Input Size (n rows) | Approx. Operations |
|---|---|
| 10 | About 10 row checks |
| 100 | About 100 row checks |
| 1000 | About 1000 row checks |
Pattern observation: The work grows linearly as the number of rows increases.
Time Complexity: O(n)
This means the time to run the query and load data grows roughly in direct proportion to the number of rows in the database.
[X] Wrong: "Reading from a SQL database with pandas is always instant regardless of data size."
[OK] Correct: The database must scan rows to find matches, and pandas must load the data, so bigger data means more time.
Understanding how pandas interacts with SQL databases and how query size affects time helps you explain data loading performance clearly in real projects.
"What if we add an index on the 'amount' column in the database? How would the time complexity change?"