Query Optimization
The optimizer transforms SQL ASTs through a multi-pass pipeline that resolves every reference, qualifies every column, expands every SELECT *, and simplifies expressions — all while preserving semantics and lossless roundtripping.
Why This Matters
The optimizer isn't primarily about query performance. It's the foundation that makes lineage, type inference, and every downstream analysis possible.
Before you can trace column lineage, you need every column qualified to its source table. Before you can infer types, you need SELECT * expanded to concrete columns. Before you can detect unused joins or dead columns, you need the full scope graph resolved.
For data products: if your product offers column-level lineage, impact analysis, dead column detection, or schema validation -- the optimizer is what makes those features accurate. Without full column qualification and star expansion, lineage tools can only give you table-level results.
For query authors: the optimizer can show the "canonical" form of any query -- fully qualified, simplified, with unnecessary subqueries eliminated. This is useful for understanding complex SQL, finding redundancies, and validating that a rewrite is semantically equivalent.
Optimization Passes
| Pass | What It Does |
|---|---|
| Normalize identifiers | Case-normalizes identifiers per dialect conventions (e.g., PostgreSQL folds to lowercase, Oracle to uppercase) |
| Qualify tables | Adds self-aliases to table references (FROM users becomes FROM users AS users) |
| Isolate table selects | Wraps bare table references so downstream passes can push projections into them |
| Qualify columns | Resolves every column reference to its source table, expands SELECT * to explicit columns, adds table qualifiers |
| Push down projections | Removes unused columns from inner queries -- if an outer query only uses 3 of 20 columns, the inner query's SELECT list is trimmed |
| Normalize predicates | Converts WHERE conditions to conjunctive normal form (AND of ORs) for consistent processing |
| Unnest subqueries | Converts correlated subqueries to equivalent JOINs where possible |
| Push down predicates | Moves WHERE conditions from outer queries into subqueries and derived tables |
| Optimize joins | Reorders and restructures JOIN conditions for better execution |
| Eliminate subqueries | Converts duplicate derived tables into CTEs to avoid repeated computation |
| Merge subqueries | Inlines single-use CTEs and simple derived tables, reducing query nesting |
| Eliminate joins | Removes JOINs that provably don't affect output (e.g., LEFT JOINs where no columns from the right table are used) |
| Eliminate CTEs | Removes CTEs that are no longer referenced after other optimization passes |
| Annotate types | Runs type inference to determine the data type of every expression |
| Canonicalize and simplify | Normalizes expressions to standard forms, folds constants, applies boolean algebra, removes tautologies |
Each pass is a pure function from immutable AST to immutable AST. Composable, testable, dialect-aware.
Scope Resolution
The optimizer builds a scope graph representing every level of the query:
- Root scope -- the outermost SELECT
- Subquery scopes -- scalar subqueries, IN/EXISTS subqueries
- Derived table scopes -- inline views in FROM clauses
- CTE scopes -- WITH clause definitions
- UNION scopes -- each branch of a set operation
- Pipe step scopes -- BigQuery pipe syntax operators
Every column reference is resolved to a specific source table and column, with fully qualified names. This resolution is what enables accurate lineage, type inference, and safe optimization.
Properties
- Type-safe -- operates on immutable, typed ASTs with dedicated interface types
- Lossless -- roundtripping is preserved through all transformations; optimized SQL can still be rendered back to valid, readable text
- Scope-aware -- every pass that needs to understand column references works through the scope system
- Dialect-aware -- identifier normalization, quoting rules, and function resolution respect the target dialect