Skip to main content

One grammar, many dialects

Datoria parses, renders, and analyzes 15 SQL dialects from a single grammar definition. Each dialect gets its own generated parser, lexer, renderer, and typed AST. They share a consistent API and the same semantic analysis tools.

One grammar, all dialects

All 15 dialects live in one grammar. Dialect-specific syntax is gated by predicates: when(bigquery) activates BigQuery's STRUCT syntax, when(snowflake) activates QUALIFY, and so on. At codegen time the predicates are evaluated, and each dialect gets a specialized parser with no runtime branching.

That's architecturally different from every other multi-dialect parser. SQLGlot maintains 31 hand-written dialect overrides. SDF/Fusion uses separate ANTLR grammars per dialect. Datoria writes the grammar once and generates the rest.

The practical consequences:

  • Adding a dialect is fast. Redshift was added in a day, SQLite in a week. You add predicates, not a new parser.
  • Bug fixes propagate. Fix a CTE parsing issue in the shared grammar and it's fixed in all 15 dialects.
  • Consistent API. The same interface hierarchy, optimizer pipeline, formatter config, and error recovery work identically across all dialects.

Tests are identity roundtrips: parse SQL, render it back, verify byte-identical output. The corpus comes from 34+ real-world sources including PostgreSQL's pg_regress suite, the DuckDB test suite, Apache Spark tests, and more. Not synthetic — these are the tests the database vendors themselves use.