Skip to main content
dquery

An SQL compiler, generated from one grammar.

15 dialects of SQL — every common database, and still growing — defined once as a declarative grammar. A code generator emits the parser, renderer, and a full semantic layer — types, column lineage, optimization. Engineered for performance and precision, so your product doesn't need a SQL compiler team.

15Dialects
177k+Tests
99.7%Roundtrip Pass Rate
44µsPer File
What it is

An SQL compiler, generated from a declarative grammar.

Not a hand-written parser, and not a wrapper around a database. SQL is described once as pure data — a declarative grammar — and a code generator emits the entire engine from it. That architecture is where the performance and the precision come from.

1

One declarative grammar

Every SQL construct is written once as pure data — no code, no functions — so the whole grammar can be analyzed statically. 15 dialects share it through inheritance: PostgreSQL extends ANSI, DuckDB extends PostgreSQL. Adding the next one is a grammar change, not a new parser — which is why coverage keeps growing.

2

The parser is generated, not written

The grammar is the source of truth. A language-agnostic code generator reads it and emits the entire parser, lexer, typed AST, and renderer — none of it hand-written. Today it's a Java library, callable from any JVM language; Rust is next, with Python and TypeScript wrappers likely to follow. One generator, 32× amplification: 3k+ lines of it produce 127k+ lines of parser.

3

A semantic layer on top

Above the parser: scope resolution, type inference, column-level lineage, and a multi-pass optimizer — all dialect-agnostic. Parse PostgreSQL, analyze it neutrally, render it as Snowflake.

Performance44µs / file

A JVM library that out-parses native code: 2.6x faster than sqlparser-rs (Rust) and 5.0x faster than libpg_query, the C parser inside PostgreSQL. Generated code, not interpreted — zero backtracking, with dispatch trees computed statically from the grammar.

  • Faster than hand-written parsers because the structure is decided at generation time, not at runtime.
  • A Java library today, callable from any JVM language; Rust is next, with native and WebAssembly builds underway.
  • See the native edition → for the native and WebAssembly benchmarks.
Precision99.7% pass

Lossless roundtripping: parse and re-render byte-for-byte, every token preserved — keywords, punctuation, even whitespace and comments.

  • 177,197 identity tests from 34 independent sources, parsed, rendered, and re-parsed.
  • Fully typed AST — 5k+ immutable node types, not untyped dictionaries.
  • Column lineage and type inference that survive CTEs, subqueries, and window functions.
Try it

Types, lineage, and AST, live

Edit the SQL below. The analyzer returns inferred types (with nullability, VARIANT, and struct types), column-level lineage through CTEs and JOINs, and the full typed AST. Switch dialects to see how parsing differs.

Loading playground...
What you can build

Anything that has to understand SQL.

The hard part — reading SQL precisely, across 15 dialects, with types and lineage — is done. Anything that has to read, check, transform, or trace SQL becomes a product you can build on top. A few of the directions it opens:

AI agents & copilots

Give an agent real tools for SQL: a column’s full lineage to reason about a query, formatting, and verification of the SQL it generates — checked before it ever runs.

Dialect migration

Datoria parses every major dialect today — Snowflake, Redshift, BigQuery, MySQL. Migration becomes paste-and-go for your users, not a six-month consulting engagement.

Pipeline & dbt lineage

Stitch every transformation — raw, templated, or dbt — into one typed graph. Trace any column end to end, and catch type and breaking changes before they ship.

Query-log analysis

Which of last week’s 80,000 queries touch users.email? Seconds, not a manual audit. Pattern mining, regression detection, and BI lineage all fall out of it.

Smarter SQL editors

Column-aware autocomplete, semantic errors, jump-to-definition. The understanding layer ships today — wire it into VS Code, JetBrains, or your own IDE.

Schema evolution

Before a migration applies, flag the views that will break, the policy referencing a renamed column, the ALTER that isn’t type-compatible. Before, not after.

The first product built on it is dfmt, our free SQL formatter.

Explore dialects

177,197 tests. 99.7% pass rate.

Each test parses, renders, and re-parses a real SQL statement and checks the result is byte-identical. The corpus comes from 34 independent sources, including PostgreSQL (pg_regress), DuckDB, Apache Spark, Trino, Google ZetaSQL, CockroachDB, SQLGlot, and SQLFluff. The 15 dialects cover every common database, with a sixteenth landing soon — each new one is a grammar change, not a rewrite. Pick a dialect to open the playground or browse its grammar.

Performance

44µs per file

22.9x faster than SQLGlot. No backtracking, linear time, JIT-friendly generated code.

Druid (JVM) (12/15)
27µs0.6x
Datoria (JVM)
44µs1.0x
jOOQ (JVM)
103µs2.4x
sqlparser-rs
114µs2.6x
Polyglot
115µs2.6x
sqloxide
168µs3.8x
Trino (JVM) (14/15)
175µs4.0x
Flink SQL (JVM) (6/15)
200µs4.6x
Spark (JVM)
208µs4.8x
pg_query
217µs5.0x

1000 iterations, 15 files, Apple M1 Max.

The stack

From grammar to column lineage in one pipeline

Grammar DSL
One grammar definition with dialect predicates. Pure data, no embedded functions.
Code Generator
Generates parser, lexer, 5,988 immutable typed AST interfaces, and a renderer per dialect.
dbt & Jinja
Full Jinja evaluator with ref(), source(), var(), config(). Compiles 59 public dbt projects (9,925 models) end-to-end.
Scope Resolution
Fully qualified references across nested CTEs, subqueries, and lateral joins.
Type Inference
20+ rule types, 134 function signatures, dialect-aware coercion, nullable tracking.
Column Lineage
One-pass DAG traces columns through JOINs, CTEs, window functions, and star expansion. Tested across 59 public dbt projects (9,925 models).
Error Recovery
Partial parses with precise error positions. Never crashes on broken or incomplete SQL.
Query Optimizer
15-pass pipeline: qualify, simplify, pushdown, unnest, merge, eliminate.
SQL Formatter
Adaptive formatting with trivia-signal detection. 20 config options, Jinja-aware.
dbt & Jinja

59 public dbt projects. 9,925 models. Zero errors.

We don't shell out to dbt or rely on regex. We wrote our own Jinja parser, evaluator, and dbt project loader, because you can't analyze a template you can't evaluate.

Jinja parser and evaluator

A purpose-built Jinja engine with its own generated parser (same grammar IR as the SQL parser). Handles macros, filters, control flow, and nested expressions, with proper scoping.

dbt project loader

Reads dbt_project.yml, resolves ref() and source(), loads seeds and schemas, and builds the DAG. Every model is compiled to pure SQL, then parsed, type-checked, and traced for lineage.

Adapter support

dbt-utils, dbt-expectations, and the core adapter macros for Snowflake, BigQuery, PostgreSQL, Redshift, Spark, Databricks, and more. Each project declares its dialect, and the matching parser handles the compiled SQL.

Runtime macros

Some dbt macros call the warehouse at build time (run_query, execute). We analyze the SQL they intend to run statically, so your pipeline doesn't need live database credentials.

Grammar

One grammar, 15 dialects

Every SQL construct is defined once in a declarative grammar, with dialect predicates picking the variants. Click any element to drill in. Switch dialects to see how the syntax varies.

Loading grammar...
Comparison

How it stacks up

The other open-source SQL compilers — SQLGlot, SDF, SQLMesh — are now owned by Fivetran or dbt. Datoria is the one that isn't, and the only engine pairing this dialect breadth with full semantic analysis.

DatoriaSQLGlot Python, Fivetran-ownedSDF/Fusion Rust, dbt Labs-ownedCalcite Java, ApachejOOQ Java, commercial
SQL dialects15, growing24+41 (ANSI)20+
Parse speed44µs999µs221µs103µs
RoundtrippingLosslessLossyLossyLossyLossy
Typed AST1k+ shared + 4k+ dialect typesUntyped dictRust structsJava classesJava DSL
Column lineageYes (O(n))LimitedYesNoNo
Type inferenceDialect-awareLimitedYesYesYes
dbt / JinjaBuilt-inNoBuilt-inNoNo
Error recoveryYesNoNoNoNo
SQL formatterAdaptiveBasicNoNoNo
IndependenceYesFivetrandbt LabsApacheData Geekery
Coming soon · Native

18.4× faster than PostgreSQL's own parser

The grammar is the product. Today our IR emits Java, but the IR itself is language-agnostic. The native backend emits machine code and WebAssembly from the same source of truth — and you can run it in your browser right now.

18.4×faster than libpg_query — PostgreSQL's own parser
10.9×faster than sqlparser-rs (Rust)
207.6 MB/sPostgreSQL parse · Apple M5 Pro
669 KBWebAssembly, running in your browser

Compact artifacts: a 3.7 MB native binary and a 669 KB WASM module, both from the same 15-dialect grammar — same semantic analysis, same dbt/Jinja pipeline, without the JVM.

The plan is to ship the core parser and AST as native code, then generate per-language AST bindings on top, so Node, Python, Rust, Go, edge runtimes, and browsers can all run native-speed parsing with a native type system.

Try the live in-browser parser and see the full benchmarks →

Or get early access to the engine.

Two ways in.

Try the formatter now. If you want the full engine, we're opening early access to a small number of companies, so get in touch.