Why Datoria
There are many SQL parsers. Most solve a subset of the problem well. Here's how Datoria compares to the alternatives -- and why you might choose it.
The Technical Gap
Before comparing features, it's worth understanding why the alternatives fall short. The issue isn't just who owns them -- it's what they can't do:
- SQLGlot is Python-only (18x slower than compiled parsers), produces lossy output (drops comments, whitespace, keyword casing), and a SIGMOD 2025 paper found a 40% translation error rate on certain dialect pairs. No error recovery means broken SQL produces no output.
- SDF/Fusion supports 4 dialects vs our 15, produces lossy output (DataFusion logical plan discards syntax), and the SQL parser is proprietary -- the dbt-fusion repo publishes lexers and Jinja but the actual SQL grammars and parser are not available.
- Apache Calcite parses a SQL standard subset only -- it can't handle BigQuery, Snowflake, or T-SQL syntax natively.
- jOOQ is a DSL-first product whose parser reformats output -- designed for translation, not preservation.
These are architectural limitations, not feature gaps. They cannot be fixed incrementally. AND the two most popular options (SQLGlot, SDF) are now owned by a single competitor.
Overview
| Capability | Datoria | SQLGlot | SDF / dbt Fusion | Apache Calcite | jOOQ |
|---|---|---|---|---|---|
| Dialects | 15 (single grammar) | 31 (hand-written) | 4 (ANTLR) | ~10 (extensible) | 30+ (hand-written) |
| Grammar approach | Generated from IR | Hand-coded Python | ANTLR grammars | JavaCC + Freemarker | Hand-coded Java |
| Roundtripping | Lossless (byte-identical) | Lossy (reformats) | Unknown | Lossy | Lossy |
| Type inference | Nullable-aware, dialect-specific | Basic | Full | Full (via schemas) | Full (via schemas) |
| Column lineage | Single-pass DAG, O(n) | Per-column traversal | Not built in | Not built in | Not built in |
| Error recovery | Grammar-derived, zero overhead | Basic | Unknown | None | None |
| AST type safety | 14,782 typed interfaces | Untyped (Python dicts) | Typed (Rust) | Typed (Java) | Typed (Java) |
| dbt support | Full Jinja + 59-project test suite | Core engine for dbt Cloud | Core engine for dbt Fusion | None | None |
| Independence | Independent | Owned by Fivetran/dbt | Owned by Fivetran/dbt | Apache Foundation | Datageekery (Swiss) |
vs SQLGlot
SQLGlot is the most widely adopted SQL parser (~9M PyPI downloads/week), focused on SQL transpilation.
Where SQLGlot is strong: massive community, 31 dialect definitions, transpilation between dialects, in-memory SQL execution engine.
Where Datoria is different:
- Lossless roundtripping. SQLGlot reformats on output — drops comments, changes whitespace. Datoria preserves every byte.
- Parse accuracy on complex syntax. SQLGlot struggles with BigQuery nested STRUCTs, Snowflake FLATTEN/QUALIFY, T-SQL CROSS APPLY. A SIGMOD 2025 paper found a 40% translation error rate on certain dialect pairs.
- Type safety. SQLGlot's AST is Python dictionaries. Datoria's is 14,782 typed Java interfaces — immutable records with
Optional<T>.
vs Apache Calcite
Apache Calcite is the industry-standard SQL framework (Hive, Druid, Flink).
Where Calcite is strong: battle-tested at massive scale, full relational algebra with cost-based optimization, extensible adapter framework.
Where Datoria is different:
- Multi-dialect parsing. Calcite parses a SQL standard subset. It can't parse BigQuery, Snowflake, or T-SQL syntax natively.
- Lossless AST. Calcite produces a relational algebra tree — doesn't preserve original SQL text, whitespace, or comments.
vs jOOQ
jOOQ is a popular Java library for type-safe SQL building and execution.
Where jOOQ is strong: excellent type-safe DSL, 30+ dialect support, robust hand-written parser, commercial product with long track record.
Where Datoria is different:
- Lossless roundtripping. jOOQ reformats output -- designed for translation, not preservation.
- Semantic analysis stack. jOOQ's parser is a means to its DSL. Datoria includes column lineage, type inference, scope resolution, and a multi-pass optimizer.
Build vs Buy
Building a production-grade multi-dialect SQL compiler from scratch is a significant undertaking:
- Time: 3-5 years for a team of compiler engineers
- Cost: $4.5M-$12.5M in engineering labor (3-5 compiler engineers x 3-5 years x $500K+ fully loaded)
- Talent: Compiler engineers with SQL domain expertise are vanishingly rare -- most teams spend months hiring before starting
- Scope: Grammar definition, parser generation, AST design, error recovery, 15+ dialects, semantic analysis (scoping, types, lineage), formatting, dbt integration
Datoria represents years of accumulated work across all of these areas, validated by 170,686+ identity tests from real-world SQL sources.
Strategic Context
The SQL tooling landscape consolidated dramatically in 2025:
- Fivetran acquired Tobiko Data (SQLGlot/SQLMesh) -- the most popular open-source SQL parser
- dbt Labs acquired SDF Labs -- a Rust-based SQL compiler, which became the dbt Fusion engine
- Fivetran and dbt Labs announced a merger -- combining both under one entity
When this merger closes, one company will control all the major open-source SQL compiler infrastructure: SQLGlot, SDF/Fusion, and SQLMesh. For companies that have been relying on these tools as neutral infrastructure, this creates a strategic dependency on a competitor.
Datoria is the only independent, production-grade SQL compiler covering this breadth of dialects with this depth of semantic analysis.