How to Write SQL Queries in Pandas Easily
You can write SQL queries in pandas by using the
pandasql library, which lets you run SQL syntax directly on pandas DataFrames. Import pandasql, then use sqldf() to execute SQL queries on your data.Syntax
Use the pandasql library to run SQL queries on pandas DataFrames. The main function is sqldf(), which takes a SQL query string and a dictionary of local variables (usually locals()) containing your DataFrames.
sqldf(sql_query, env=locals()): Runs the SQL query on DataFrames inenv.sql_query: A string with your SQL SELECT statement.env: A dictionary with DataFrame names as keys.
python
from pandasql import sqldf # Syntax pattern: result = sqldf("SELECT * FROM dataframe_name WHERE condition", locals())
Example
This example shows how to select rows from a pandas DataFrame using an SQL query with pandasql. It demonstrates filtering data with a WHERE clause.
python
import pandas as pd from pandasql import sqldf # Create a sample DataFrame data = {'name': ['Alice', 'Bob', 'Charlie', 'David'], 'age': [25, 30, 35, 40], 'city': ['NY', 'LA', 'NY', 'Chicago']} df = pd.DataFrame(data) # Write SQL query to select people older than 30 query = "SELECT name, age FROM df WHERE age > 30" # Run SQL query on DataFrame df_result = sqldf(query, locals()) print(df_result)
Output
name age
0 Charlie 35
1 David 40
Common Pitfalls
Common mistakes when writing SQL in pandas include:
- Not importing
pandasqlbefore usingsqldf(). - Using DataFrame names in SQL that don't match variable names in Python.
- Forgetting to pass
locals()or the correct environment dictionary tosqldf(). - Trying to use SQL features not supported by
pandasql(it supports basic SELECT, WHERE, JOIN, etc.).
Example of a wrong and right way:
python
import pandas as pd from pandasql import sqldf df = pd.DataFrame({'x': [1, 2, 3]}) # Wrong: dataframe name in SQL does not match Python variable # result = sqldf("SELECT * FROM data", locals()) # 'data' does not exist # Right: use correct DataFrame name result = sqldf("SELECT * FROM df", locals()) print(result)
Output
x
0 1
1 2
2 3
Quick Reference
Here is a quick reference for using SQL in pandas with pandasql:
| Action | Syntax Example |
|---|---|
| Select all columns | SELECT * FROM df |
| Filter rows | SELECT * FROM df WHERE column > 10 |
| Select specific columns | SELECT col1, col2 FROM df |
| Order results | SELECT * FROM df ORDER BY col1 DESC |
| Join two DataFrames | SELECT a.*, b.col FROM df1 a JOIN df2 b ON a.id = b.id |
Key Takeaways
Use the pandasql library's sqldf() function to run SQL queries on pandas DataFrames.
Always pass locals() to sqldf() so it can find your DataFrames by name.
Make sure SQL table names match your pandas DataFrame variable names exactly.
pandasql supports basic SQL commands like SELECT, WHERE, JOIN, and ORDER BY.
Common errors come from mismatched names or forgetting to import pandasql.