15 dialects of SQL — every common database, and still growing — defined once as a declarative grammar. A code generator emits the parser, renderer, and a full semantic layer — types, column lineage, optimization. Engineered for performance and precision, so your product doesn't need a SQL compiler team.
Not a hand-written parser, and not a wrapper around a database. SQL is described once as pure data — a declarative grammar — and a code generator emits the entire engine from it. That architecture is where the performance and the precision come from.
Every SQL construct is written once as pure data — no code, no functions — so the whole grammar can be analyzed statically. 15 dialects share it through inheritance: PostgreSQL extends ANSI, DuckDB extends PostgreSQL. Adding the next one is a grammar change, not a new parser — which is why coverage keeps growing.
The grammar is the source of truth. A language-agnostic code generator reads it and emits the entire parser, lexer, typed AST, and renderer — none of it hand-written. Today it's a Java library, callable from any JVM language; Rust is next, with Python and TypeScript wrappers likely to follow. One generator, 32× amplification: 3k+ lines of it produce 127k+ lines of parser.
Above the parser: scope resolution, type inference, column-level lineage, and a multi-pass optimizer — all dialect-agnostic. Parse PostgreSQL, analyze it neutrally, render it as Snowflake.
A JVM library that out-parses native code: 2.6x faster than sqlparser-rs (Rust) and 5.0x faster than libpg_query, the C parser inside PostgreSQL. Generated code, not interpreted — zero backtracking, with dispatch trees computed statically from the grammar.
Lossless roundtripping: parse and re-render byte-for-byte, every token preserved — keywords, punctuation, even whitespace and comments.
Edit the SQL below. The analyzer returns inferred types (with nullability, VARIANT, and struct types), column-level lineage through CTEs and JOINs, and the full typed AST. Switch dialects to see how parsing differs.
The hard part — reading SQL precisely, across 15 dialects, with types and lineage — is done. Anything that has to read, check, transform, or trace SQL becomes a product you can build on top. A few of the directions it opens:
Give an agent real tools for SQL: a column’s full lineage to reason about a query, formatting, and verification of the SQL it generates — checked before it ever runs.
Datoria parses every major dialect today — Snowflake, Redshift, BigQuery, MySQL. Migration becomes paste-and-go for your users, not a six-month consulting engagement.
Stitch every transformation — raw, templated, or dbt — into one typed graph. Trace any column end to end, and catch type and breaking changes before they ship.
Which of last week’s 80,000 queries touch users.email? Seconds, not a manual audit. Pattern mining, regression detection, and BI lineage all fall out of it.
Column-aware autocomplete, semantic errors, jump-to-definition. The understanding layer ships today — wire it into VS Code, JetBrains, or your own IDE.
Before a migration applies, flag the views that will break, the policy referencing a renamed column, the ALTER that isn’t type-compatible. Before, not after.
The first product built on it is dfmt, our free SQL formatter.
Each test parses, renders, and re-parses a real SQL statement and checks the result is byte-identical. The corpus comes from 34 independent sources, including PostgreSQL (pg_regress), DuckDB, Apache Spark, Trino, Google ZetaSQL, CockroachDB, SQLGlot, and SQLFluff. The 15 dialects cover every common database, with a sixteenth landing soon — each new one is a grammar change, not a rewrite. Pick a dialect to open the playground or browse its grammar.
22.9x faster than SQLGlot. No backtracking, linear time, JIT-friendly generated code.
1000 iterations, 15 files, Apple M1 Max.
We don't shell out to dbt or rely on regex. We wrote our own Jinja parser, evaluator, and dbt project loader, because you can't analyze a template you can't evaluate.
A purpose-built Jinja engine with its own generated parser (same grammar IR as the SQL parser). Handles macros, filters, control flow, and nested expressions, with proper scoping.
Reads dbt_project.yml, resolves ref() and source(), loads seeds and schemas, and builds the DAG. Every model is compiled to pure SQL, then parsed, type-checked, and traced for lineage.
dbt-utils, dbt-expectations, and the core adapter macros for Snowflake, BigQuery, PostgreSQL, Redshift, Spark, Databricks, and more. Each project declares its dialect, and the matching parser handles the compiled SQL.
Some dbt macros call the warehouse at build time (run_query, execute). We analyze the SQL they intend to run statically, so your pipeline doesn't need live database credentials.
Every SQL construct is defined once in a declarative grammar, with dialect predicates picking the variants. Click any element to drill in. Switch dialects to see how the syntax varies.
The other open-source SQL compilers — SQLGlot, SDF, SQLMesh — are now owned by Fivetran or dbt. Datoria is the one that isn't, and the only engine pairing this dialect breadth with full semantic analysis.
| Datoria | SQLGlot Python, Fivetran-owned | SDF/Fusion Rust, dbt Labs-owned | Calcite Java, Apache | jOOQ Java, commercial | |
|---|---|---|---|---|---|
| SQL dialects | 15, growing | 24+ | 4 | 1 (ANSI) | 20+ |
| Parse speed | 44µs | 999µs | — | 221µs | 103µs |
| Roundtripping | Lossless | Lossy | Lossy | Lossy | Lossy |
| Typed AST | 1k+ shared + 4k+ dialect types | Untyped dict | Rust structs | Java classes | Java DSL |
| Column lineage | Yes (O(n)) | Limited | Yes | No | No |
| Type inference | Dialect-aware | Limited | Yes | Yes | Yes |
| dbt / Jinja | Built-in | No | Built-in | No | No |
| Error recovery | Yes | No | No | No | No |
| SQL formatter | Adaptive | Basic | No | No | No |
| Independence | Yes | Fivetran | dbt Labs | Apache | Data Geekery |
The grammar is the product. Today our IR emits Java, but the IR itself is language-agnostic. The native backend emits machine code and WebAssembly from the same source of truth — and you can run it in your browser right now.
Compact artifacts: a 3.7 MB native binary and a 669 KB WASM module, both from the same 15-dialect grammar — same semantic analysis, same dbt/Jinja pipeline, without the JVM.
The plan is to ship the core parser and AST as native code, then generate per-language AST bindings on top, so Node, Python, Rust, Go, edge runtimes, and browsers can all run native-speed parsing with a native type system.
Try the live in-browser parser and see the full benchmarks →
Or get early access to the engine.
Try the formatter now. If you want the full engine, we're opening early access to a small number of companies, so get in touch.