Skip to main content

Performance

Datoria generates the fastest full-featured SQL parser we've been able to benchmark. It outperforms the parsers written in Rust and C, and every other JVM-based parser we've tested.

Benchmark results

Measured over 1000 iterations parsing 15 query-only SQL files on Apple M1 Max. Warmup iterations excluded. Full results cover 21 parsers.
Loading chart...
Full benchmark table
ParserPer-Filevs DatoriaFiles Parsed
Druid (JVM)27 µs0.6x12/15
Druid (native)27 µs0.6x15/15
Datoria (JVM)44 µs1.0x15/15
Datoria (native)48 µs1.0x15/15
jOOQ (JVM)103 µs2.4x15/15
sqlparser-rs114 µs2.6x15/15
Polyglot115 µs2.6x15/15
Calcite (native)146 µs3.3x15/15
jOOQ (native)164 µs3.8x15/15
sqloxide168 µs3.8x15/15
Trino (JVM)175 µs4.0x14/15
Flink SQL (JVM)200 µs4.6x6/15
Spark (JVM)208 µs4.8x15/15
pg_query217 µs5.0x15/15
Calcite (JVM)221 µs5.1x6/15
node-sql-parser784 µs17.9x15/15
JSqlParser (native)974 µs22.3x15/15
SQLGlot999 µs22.9x15/15
JSqlParser (JVM)1.4 ms31.1x15/15
ShardingSphere (JVM)3.6 ms83.4x9/15
sqlparse5.6 ms127.8x15/15

Why it's faster than Rust and C

A JVM parser beating Rust and C parsers sounds surprising, but it's an architectural consequence, not a benchmark trick.

Generated, not interpreted. Most SQL parsers are either hand-written recursive descent (sqlparser-rs, pg_query, jOOQ) or ANTLR-generated with a generic runtime. Datoria generates specialized Java code from a grammar IR: concrete methods for each SQL construct, no runtime grammar interpretation. The JIT inlines, devirtualizes, and optimizes this code the same way it would for hand-written Java.

Zero backtracking. This is the key advantage over hand-written parsers. A hand-written parser often needs speculative parsing: "try this as a subquery; if it fails, back up and try it as an expression." Datoria's generated lookahead logic picks the correct production at each decision point. No wasted work. Rust parsers like sqlparser-rs backtrack frequently on complex SQL, which is how a JVM parser ends up outperforming them despite the language overhead.

Linear time. Parsing scales linearly with input size. No pathological cases — every token is consumed exactly once.

JIT-friendly patterns. The generated code is structured to suit the JIT: small methods, predictable branches, minimal allocation, cache-friendly access.

Speed comparisons only matter when the parsers do equivalent work. Datoria's parser:

  • Produces a fully-typed, lossless AST (not just a parse tree).
  • Preserves all whitespace, comments, and keyword spelling.
  • Handles 15 SQL dialects from the same codebase.
  • Includes error recovery for broken SQL.
  • Supports the full range of SQL syntax (DDL, DML, DQL, DCL).

Simpler parsers that skip whitespace, produce untyped trees, or cover a subset of SQL may be faster, but they're solving a different problem.