Skip to main content

Performance

Datoria generates the fastest full-featured SQL parser we've been able to benchmark — outperforming parsers written in Rust, C, and every other JVM-based parser tested.

Benchmark Results

Measured over 1000 iterations parsing 15 query-only SQL files on Apple M1 Max. Warmup iterations excluded. Full results cover 21 parsers.
Loading chart...
Full benchmark table
ParserPer-Filevs DatoriaFiles Parsed
Druid (JVM)27 µs0.6x12/15
Druid (native)27 µs0.6x15/15
Datoria (JVM)44 µs1.0x15/15
Datoria (native)48 µs1.0x15/15
jOOQ (JVM)103 µs2.4x15/15
sqlparser-rs114 µs2.6x15/15
Polyglot115 µs2.6x15/15
Calcite (native)146 µs3.3x15/15
jOOQ (native)164 µs3.8x15/15
sqloxide168 µs3.8x15/15
Trino (JVM)175 µs4.0x14/15
Flink SQL (JVM)200 µs4.6x6/15
Spark (JVM)208 µs4.8x15/15
pg_query217 µs5.0x15/15
Calcite (JVM)221 µs5.1x6/15
node-sql-parser784 µs17.9x15/15
JSqlParser (native)974 µs22.3x15/15
SQLGlot999 µs22.9x15/15
JSqlParser (JVM)1.4 ms31.1x15/15
ShardingSphere (JVM)3.6 ms83.4x9/15
sqlparse5.6 ms127.8x15/15

Why It's Faster Than Rust and C

A JVM parser beating Rust and C parsers sounds surprising, but it's an architectural consequence, not a benchmark trick.

Generated, not interpreted. Most SQL parsers are either hand-written recursive descent (sqlparser-rs, pg_query, jOOQ) or ANTLR-generated with a generic runtime. Datoria generates specialized Java code from a grammar IR — concrete methods for each SQL construct, with no runtime grammar interpretation. The JIT compiler inlines, devirtualizes, and optimizes this code the same way it would hand-written Java.

Zero backtracking. This is the key advantage over hand-written parsers. A hand-written parser often needs speculative parsing: "try parsing this as a subquery, and if that fails, backtrack and try it as an expression." Datoria's generated lookahead logic deterministically selects the correct production at each decision point. No wasted work, ever. Rust parsers like sqlparser-rs backtrack frequently on complex SQL — which is why a JVM parser can outperform them despite the language overhead.

Linear time. Parsing scales linearly with input size. There are no pathological cases where complex SQL causes exponential blowup — every token is consumed exactly once.

JIT-friendly patterns. The generated code is structured to play well with the JVM's JIT compiler: small methods, predictable branches, minimal allocation, and cache-friendly access patterns.

Speed comparisons only matter when comparing equivalent functionality. Datoria's parser:

  • Produces a fully-typed, lossless AST (not just a parse tree)
  • Preserves all whitespace, comments, and keyword spelling
  • Handles 15 SQL dialects from the same codebase
  • Includes error recovery for broken SQL
  • Supports the full range of SQL syntax (DDL, DML, DQL, DCL)

Simpler parsers that skip whitespace, produce untyped trees, or only handle a subset of SQL may be faster — but they're solving a different problem.