One Grammar, Many Dialects
Datoria parses, renders, and analyzes 15 SQL dialects from a single grammar definition. Each dialect gets its own generated parser, lexer, renderer, and typed AST — but they all share a consistent API and can be processed by the same semantic analysis tools.
One Grammar, All Dialects
All 15 dialects are defined in one grammar. Dialect-specific syntax is controlled by predicates: when(bigquery) activates BigQuery's STRUCT syntax, when(snowflake) activates QUALIFY, and so on. At code generation time, these predicates are evaluated and each dialect gets a specialized parser with no runtime overhead.
This is architecturally different from every other multi-dialect parser. SQLGlot maintains 31 hand-written dialect overrides. SDF/Fusion uses separate ANTLR grammars per dialect. Datoria defines the grammar once and generates everything.
The practical consequences:
- Adding a dialect is fast — Redshift was added in a day, SQLite in a week. You add predicates, not a new parser.
- Bug fixes propagate — fix a CTE parsing issue in the shared grammar and it's fixed in all 15 dialects.
- Consistent API — the same interface hierarchy, optimizer pipeline, formatter config, and error recovery work identically across all dialects.
Tests are identity roundtrips: parse SQL, render it back, verify byte-identical output. The corpus is drawn from 33+ real-world sources including PostgreSQL's pg_regress suite, DuckDB's test suite, Apache Spark tests, and more. These are not synthetic — they're the tests the database vendors use to validate their own parsers.