Consider the tables Employees(emp_id, dept_id, salary) and Departments(dept_id, dept_name). What is the output of the following query?
SELECT emp_id, salary FROM Employees e WHERE salary > (SELECT AVG(salary) FROM Employees WHERE dept_id = e.dept_id);
Assume the data:
Employees: (1, 10, 5000), (2, 10, 6000), (3, 20, 4000), (4, 20, 4500)
SELECT emp_id, salary FROM Employees e WHERE salary > (SELECT AVG(salary) FROM Employees WHERE dept_id = e.dept_id);
Think about how the average salary is calculated per department and which employees have salaries above that average.
The subquery calculates the average salary for the employee's department. For dept 10, average is (5000+6000)/2=5500. Only emp_id 2 has salary 6000 > 5500. For dept 20, average is (4000+4500)/2=4250. emp_id 4 has salary 4500 > 4250 (emp_id 3: 4000 < 4250). Thus, the query returns emp_id 2, salary 6000 and emp_id 4, salary 4500, which is option A.
Which option contains a syntax error in the correlated subquery?
SELECT e1.emp_id FROM Employees e1 WHERE e1.salary > (SELECT AVG(e2.salary) FROM Employees e2 WHERE e2.dept_id = e1.dept_id);
Look for missing table aliases or ambiguous column references.
Option D has WHERE dept_id = dept_id which compares the same column without alias, causing ambiguity and logical error. It is syntactically valid but logically incorrect. However, since no alias is used, it can cause confusion. Other options correctly use aliases.
Given the query:
SELECT emp_id FROM Employees e WHERE salary > (SELECT AVG(salary) FROM Employees WHERE dept_id = e.dept_id);
Which option is the best way to optimize this query for large datasets?
Think about avoiding repeated subquery execution for each row.
Correlated subqueries run the inner query for each outer row, which is expensive. Precomputing averages with a JOIN and GROUP BY avoids repeated calculations, improving performance. Indexing salary helps but does not eliminate repeated subqueries. DISTINCT and replacing AVG with MAX do not optimize correlated subqueries effectively.
Why does the following query return no rows?
SELECT emp_id FROM Employees e WHERE salary > (SELECT MAX(salary) FROM Employees WHERE dept_id = e.dept_id AND emp_id <> e.emp_id);
Assume each department has only one employee.
Consider what happens when the subquery has no matching rows.
If a department has only one employee, the subquery filters out that employee (emp_id <> e.emp_id), so no rows match. The subquery returns NULL. Comparing salary > NULL yields unknown (false), so no rows are returned.
Which statement best describes how a correlated subquery executes?
Think about how the inner query depends on the outer query's current row.
Correlated subqueries execute the inner query once per each row of the outer query, using values from that row to filter or compute results. This is why they can be slower than uncorrelated subqueries.