Skip to main content

One Grammar, Many Dialects

Datoria parses, renders, and analyzes 15 SQL dialects from a single grammar definition. Each dialect gets its own generated parser, lexer, renderer, and typed AST — but they all share a consistent API and can be processed by the same semantic analysis tools.

One Grammar, All Dialects

All 15 dialects are defined in one grammar. Dialect-specific syntax is controlled by predicates: when(bigquery) activates BigQuery's STRUCT syntax, when(snowflake) activates QUALIFY, and so on. At code generation time, these predicates are evaluated and each dialect gets a specialized parser with no runtime overhead.

This is architecturally different from every other multi-dialect parser. SQLGlot maintains 31 hand-written dialect overrides. SDF/Fusion uses separate ANTLR grammars per dialect. Datoria defines the grammar once and generates everything.

The practical consequences:

  • Adding a dialect is fast — Redshift was added in a day, SQLite in a week. You add predicates, not a new parser.
  • Bug fixes propagate — fix a CTE parsing issue in the shared grammar and it's fixed in all 15 dialects.
  • Consistent API — the same interface hierarchy, optimizer pipeline, formatter config, and error recovery work identically across all dialects.

Tests are identity roundtrips: parse SQL, render it back, verify byte-identical output. The corpus is drawn from 33+ real-world sources including PostgreSQL's pg_regress suite, DuckDB's test suite, Apache Spark tests, and more. These are not synthetic — they're the tests the database vendors use to validate their own parsers.