Concept Flow - Understanding the Catalyst optimizer
Input: SQL/DataFrame Query
Parsing: Convert query to Logical Plan
Analysis: Resolve references, check schema
Optimization: Apply rules to Logical Plan
Physical Planning: Create Physical Plans
Cost Model: Select best Physical Plan
Execution: Run selected plan on Spark cluster
The Catalyst optimizer takes a query, turns it into a plan, improves it step-by-step, then runs the best plan on Spark.