Skip to main content

Why Datoria

There are many SQL parsers. Most solve a subset of the problem well. Here's how Datoria compares to the alternatives -- and why you might choose it.

The Technical Gap

Before comparing features, it's worth understanding why the alternatives fall short. The issue isn't just who owns them -- it's what they can't do:

  • SQLGlot is Python-only (18x slower than compiled parsers), produces lossy output (drops comments, whitespace, keyword casing), and a SIGMOD 2025 paper found a 40% translation error rate on certain dialect pairs. No error recovery means broken SQL produces no output.
  • SDF/Fusion supports 4 dialects vs our 15, produces lossy output (DataFusion logical plan discards syntax), and the SQL parser is proprietary -- the dbt-fusion repo publishes lexers and Jinja but the actual SQL grammars and parser are not available.
  • Apache Calcite parses a SQL standard subset only -- it can't handle BigQuery, Snowflake, or T-SQL syntax natively.
  • jOOQ is a DSL-first product whose parser reformats output -- designed for translation, not preservation.

These are architectural limitations, not feature gaps. They cannot be fixed incrementally. AND the two most popular options (SQLGlot, SDF) are now owned by a single competitor.

Overview

CapabilityDatoriaSQLGlotSDF / dbt FusionApache CalcitejOOQ
Dialects15 (single grammar)31 (hand-written)4 (ANTLR)~10 (extensible)30+ (hand-written)
Grammar approachGenerated from IRHand-coded PythonANTLR grammarsJavaCC + FreemarkerHand-coded Java
RoundtrippingLossless (byte-identical)Lossy (reformats)UnknownLossyLossy
Type inferenceNullable-aware, dialect-specificBasicFullFull (via schemas)Full (via schemas)
Column lineageSingle-pass DAG, O(n)Per-column traversalNot built inNot built inNot built in
Error recoveryGrammar-derived, zero overheadBasicUnknownNoneNone
AST type safety14,782 typed interfacesUntyped (Python dicts)Typed (Rust)Typed (Java)Typed (Java)
dbt supportFull Jinja + 59-project test suiteCore engine for dbt CloudCore engine for dbt FusionNoneNone
IndependenceIndependentOwned by Fivetran/dbtOwned by Fivetran/dbtApache FoundationDatageekery (Swiss)

vs SQLGlot

SQLGlot is the most widely adopted SQL parser (~9M PyPI downloads/week), focused on SQL transpilation.

Where SQLGlot is strong: massive community, 31 dialect definitions, transpilation between dialects, in-memory SQL execution engine.

Where Datoria is different:

  • Lossless roundtripping. SQLGlot reformats on output — drops comments, changes whitespace. Datoria preserves every byte.
  • Parse accuracy on complex syntax. SQLGlot struggles with BigQuery nested STRUCTs, Snowflake FLATTEN/QUALIFY, T-SQL CROSS APPLY. A SIGMOD 2025 paper found a 40% translation error rate on certain dialect pairs.
  • Type safety. SQLGlot's AST is Python dictionaries. Datoria's is 14,782 typed Java interfaces — immutable records with Optional<T>.

vs Apache Calcite

Apache Calcite is the industry-standard SQL framework (Hive, Druid, Flink).

Where Calcite is strong: battle-tested at massive scale, full relational algebra with cost-based optimization, extensible adapter framework.

Where Datoria is different:

  • Multi-dialect parsing. Calcite parses a SQL standard subset. It can't parse BigQuery, Snowflake, or T-SQL syntax natively.
  • Lossless AST. Calcite produces a relational algebra tree — doesn't preserve original SQL text, whitespace, or comments.

vs jOOQ

jOOQ is a popular Java library for type-safe SQL building and execution.

Where jOOQ is strong: excellent type-safe DSL, 30+ dialect support, robust hand-written parser, commercial product with long track record.

Where Datoria is different:

  • Lossless roundtripping. jOOQ reformats output -- designed for translation, not preservation.
  • Semantic analysis stack. jOOQ's parser is a means to its DSL. Datoria includes column lineage, type inference, scope resolution, and a multi-pass optimizer.

Build vs Buy

Building a production-grade multi-dialect SQL compiler from scratch is a significant undertaking:

  • Time: 3-5 years for a team of compiler engineers
  • Cost: $4.5M-$12.5M in engineering labor (3-5 compiler engineers x 3-5 years x $500K+ fully loaded)
  • Talent: Compiler engineers with SQL domain expertise are vanishingly rare -- most teams spend months hiring before starting
  • Scope: Grammar definition, parser generation, AST design, error recovery, 15+ dialects, semantic analysis (scoping, types, lineage), formatting, dbt integration

Datoria represents years of accumulated work across all of these areas, validated by 170,686+ identity tests from real-world SQL sources.

Strategic Context

The SQL tooling landscape consolidated dramatically in 2025:

  • Fivetran acquired Tobiko Data (SQLGlot/SQLMesh) -- the most popular open-source SQL parser
  • dbt Labs acquired SDF Labs -- a Rust-based SQL compiler, which became the dbt Fusion engine
  • Fivetran and dbt Labs announced a merger -- combining both under one entity

When this merger closes, one company will control all the major open-source SQL compiler infrastructure: SQLGlot, SDF/Fusion, and SQLMesh. For companies that have been relying on these tools as neutral infrastructure, this creates a strategic dependency on a competitor.

Datoria is the only independent, production-grade SQL compiler covering this breadth of dialects with this depth of semantic analysis.