Given two tables employees and departments, which query returns the names of employees who work in the 'Sales' department?
Table employees: (id, name, dept_id)
Table departments: (id, dept_name)
SELECT e.name FROM employees e JOIN departments d ON e.dept_id = d.id WHERE d.dept_name = 'Sales';
Think about how JOIN matches rows and how subqueries filter results.
Option A uses a JOIN correctly to match employees with the 'Sales' department. Option A uses a subquery but assumes only one department id, which is valid here but less flexible. Option A uses IN which works but is less efficient. Option A is invalid because departments has no dept_id column.
Which statement best describes the typical performance difference between using a JOIN and a subquery in SQL when retrieving related data?
Consider how databases optimize queries and use indexes.
JOINs typically allow the database engine to optimize data retrieval better by using indexes and combining tables efficiently. Subqueries can sometimes be less efficient, especially correlated subqueries.
Which SQL query has a syntax error when trying to select employees working in the 'Marketing' department?
SELECT name FROM employees WHERE dept_id = (SELECT id FROM departments WHERE dept_name = 'Marketing');
Check if the column names in the subquery exist in the referenced table.
Option D references dept_id in departments which does not exist. The correct column is id. Other options use valid syntax.
You want to list all employees with their department names. Which query is generally more efficient on large datasets?
Consider how JOINs and subqueries execute on large tables.
Option C uses an explicit JOIN which is generally optimized better by databases. Option C uses a correlated subquery which can be slow on large datasets. Option C uses implicit join syntax which is less clear but similar to A. Option C filters out employees without departments, which may not be desired.
A query uses a correlated subquery to count orders per customer:
SELECT c.customer_id, c.name, (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS order_count FROM customers c;
It runs very slowly on large data. What is the main reason?
Think about how correlated subqueries execute for each row.
Correlated subqueries run once for each row in the outer query, causing repeated scans of the orders table, which is inefficient on large datasets. Using a JOIN with GROUP BY is usually faster.