0
0
PandasHow-ToBeginner · 3 min read

How to Write SQL Queries in Pandas Easily

You can write SQL queries in pandas by using the pandasql library, which lets you run SQL syntax directly on pandas DataFrames. Import pandasql, then use sqldf() to execute SQL queries on your data.
📐

Syntax

Use the pandasql library to run SQL queries on pandas DataFrames. The main function is sqldf(), which takes a SQL query string and a dictionary of local variables (usually locals()) containing your DataFrames.

  • sqldf(sql_query, env=locals()): Runs the SQL query on DataFrames in env.
  • sql_query: A string with your SQL SELECT statement.
  • env: A dictionary with DataFrame names as keys.
python
from pandasql import sqldf

# Syntax pattern:
result = sqldf("SELECT * FROM dataframe_name WHERE condition", locals())
💻

Example

This example shows how to select rows from a pandas DataFrame using an SQL query with pandasql. It demonstrates filtering data with a WHERE clause.

python
import pandas as pd
from pandasql import sqldf

# Create a sample DataFrame
data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
        'age': [25, 30, 35, 40],
        'city': ['NY', 'LA', 'NY', 'Chicago']}
df = pd.DataFrame(data)

# Write SQL query to select people older than 30
query = "SELECT name, age FROM df WHERE age > 30"

# Run SQL query on DataFrame
df_result = sqldf(query, locals())
print(df_result)
Output
name age 0 Charlie 35 1 David 40
⚠️

Common Pitfalls

Common mistakes when writing SQL in pandas include:

  • Not importing pandasql before using sqldf().
  • Using DataFrame names in SQL that don't match variable names in Python.
  • Forgetting to pass locals() or the correct environment dictionary to sqldf().
  • Trying to use SQL features not supported by pandasql (it supports basic SELECT, WHERE, JOIN, etc.).

Example of a wrong and right way:

python
import pandas as pd
from pandasql import sqldf

df = pd.DataFrame({'x': [1, 2, 3]})

# Wrong: dataframe name in SQL does not match Python variable
# result = sqldf("SELECT * FROM data", locals())  # 'data' does not exist

# Right: use correct DataFrame name
result = sqldf("SELECT * FROM df", locals())
print(result)
Output
x 0 1 1 2 2 3
📊

Quick Reference

Here is a quick reference for using SQL in pandas with pandasql:

ActionSyntax Example
Select all columnsSELECT * FROM df
Filter rowsSELECT * FROM df WHERE column > 10
Select specific columnsSELECT col1, col2 FROM df
Order resultsSELECT * FROM df ORDER BY col1 DESC
Join two DataFramesSELECT a.*, b.col FROM df1 a JOIN df2 b ON a.id = b.id

Key Takeaways

Use the pandasql library's sqldf() function to run SQL queries on pandas DataFrames.
Always pass locals() to sqldf() so it can find your DataFrames by name.
Make sure SQL table names match your pandas DataFrame variable names exactly.
pandasql supports basic SQL commands like SELECT, WHERE, JOIN, and ORDER BY.
Common errors come from mismatched names or forgetting to import pandasql.