Skip to main content

Why Datoria

There are plenty of SQL parsers. Most handle a subset of the problem well. Here's how Datoria stacks up against the alternatives, and where it differs.

The technical gap

Before comparing features, it's worth understanding what the alternatives can't do, regardless of who owns them:

  • SQLGlot is Python-only (about 18x slower than compiled parsers), produces lossy output (drops comments, whitespace, keyword casing), and a SIGMOD 2025 paper found a 40% translation error rate on certain dialect pairs. No error recovery means broken SQL produces no output.
  • SDF/Fusion supports 4 dialects against our 15, produces lossy output (the DataFusion logical plan discards syntax), and the SQL parser is proprietary — the dbt-fusion repo publishes lexers and Jinja, but the actual SQL grammars and parser are not available.
  • Apache Calcite parses a SQL-standard subset. It can't handle BigQuery, Snowflake, or T-SQL syntax natively.
  • jOOQ is a DSL-first product whose parser reformats output. It's designed for translation, not preservation.

These are architectural limitations, not feature gaps — you can't close them incrementally. And the two most popular options (SQLGlot, SDF) are now owned by the same competitor.

Overview

CapabilityDatoriaSQLGlotSDF / dbt FusionApache CalcitejOOQ
Dialects15 (single grammar)31 (hand-written)4 (ANTLR)~10 (extensible)30+ (hand-written)
Grammar approachGenerated from IRHand-coded PythonANTLR grammarsJavaCC + FreemarkerHand-coded Java
RoundtrippingLossless (byte-identical)Lossy (reformats)UnknownLossyLossy
Type inferenceNullable-aware, dialect-specificBasicFullFull (via schemas)Full (via schemas)
Column lineageSingle-pass DAG, O(n)Per-column traversalNot built inNot built inNot built in
Error recoveryGrammar-derived, zero overheadBasicUnknownNoneNone
AST type safety5,988 typed interfacesUntyped (Python dicts)Typed (Rust)Typed (Java)Typed (Java)
dbt supportFull Jinja + 59-project test suiteCore engine for dbt CloudCore engine for dbt FusionNoneNone
IndependenceIndependentOwned by Fivetran/dbtOwned by Fivetran/dbtApache FoundationDatageekery (Swiss)

vs SQLGlot

SQLGlot is the most widely adopted SQL parser (~9M PyPI downloads per week), focused on transpilation.

Where SQLGlot is strong: massive community, 31 dialect definitions, transpilation between dialects, in-memory SQL execution engine.

Where Datoria differs:

  • Lossless roundtripping. SQLGlot reformats on output — drops comments, changes whitespace. Datoria preserves every byte.
  • Parse accuracy on complex syntax. SQLGlot struggles with BigQuery nested STRUCTs, Snowflake FLATTEN/QUALIFY, and T-SQL CROSS APPLY. A SIGMOD 2025 paper found a 40% translation error rate on certain dialect pairs.
  • Type safety. SQLGlot's AST is Python dictionaries. Datoria's is 5,988 typed Java interfaces, immutable records with Optional<T>.

vs Apache Calcite

Apache Calcite is the standard SQL framework behind Hive, Druid, and Flink.

Where Calcite is strong: battle-tested at massive scale, full relational algebra with cost-based optimization, extensible adapter framework.

Where Datoria differs:

  • Multi-dialect parsing. Calcite parses a SQL-standard subset. It can't parse BigQuery, Snowflake, or T-SQL syntax natively.
  • Lossless AST. Calcite produces a relational algebra tree. The original SQL text, whitespace, and comments are gone.

vs jOOQ

jOOQ is a popular Java library for type-safe SQL building and execution.

Where jOOQ is strong: strong type-safe DSL, 30+ dialect support, robust hand-written parser, commercial product with a long track record.

Where Datoria differs:

  • Lossless roundtripping. jOOQ reformats output. It's designed for translation, not preservation.
  • Semantic analysis stack. jOOQ's parser exists to feed its DSL. Datoria adds column lineage, type inference, scope resolution, and a multi-pass optimizer on top of the parser.

Build vs buy

Building a production-grade multi-dialect SQL compiler from scratch is a serious project:

  • Time: 3–5 years for a team of compiler engineers.
  • Cost: $4.5M–$12.5M in labor (3–5 compiler engineers × 3–5 years × $500K+ fully loaded).
  • Talent: Compiler engineers with SQL domain expertise are rare. Most teams spend months hiring before they start.
  • Scope: Grammar definition, parser generation, AST design, error recovery, 15+ dialects, semantic analysis (scoping, types, lineage), formatting, dbt integration.

Datoria represents years of work across all of these areas, validated by 177,197+ identity tests from real-world SQL sources.

Strategic context

The SQL tooling landscape consolidated in 2025:

  • Fivetran acquired Tobiko Data (SQLGlot/SQLMesh), the most popular open-source SQL parser.
  • dbt Labs acquired SDF Labs, a Rust-based SQL compiler that became the dbt Fusion engine.
  • Fivetran and dbt Labs announced a merger, putting both under one roof.

Once that merger closes, one company controls all the major open-source SQL compiler infrastructure: SQLGlot, SDF/Fusion, and SQLMesh. For companies that built on these tools as neutral infrastructure, that's a strategic dependency on a competitor.

Datoria is the only independent, production-grade SQL compiler covering this breadth of dialects with this depth of semantic analysis.