Performance

Datoria generates the fastest full-featured SQL parser we've been able to benchmark. It outperforms the parsers written in Rust and C, and every other JVM-based parser we've tested.

Benchmark results

Measured over 1000 iterations parsing 15 query-only SQL files on Apple M1 Max. Warmup iterations excluded. Full results cover 21 parsers.

Loading chart...

Full benchmark table

Parser	Per-File	vs Datoria	Files Parsed
Druid (JVM)	27 µs	0.6x	12/15
Druid (native)	27 µs	0.6x	15/15
Datoria (JVM)	44 µs	1.0x	15/15
Datoria (native)	48 µs	1.0x	15/15
jOOQ (JVM)	103 µs	2.4x	15/15
sqlparser-rs	114 µs	2.6x	15/15
Polyglot	115 µs	2.6x	15/15
Calcite (native)	146 µs	3.3x	15/15
jOOQ (native)	164 µs	3.8x	15/15
sqloxide	168 µs	3.8x	15/15
Trino (JVM)	175 µs	4.0x	14/15
Flink SQL (JVM)	200 µs	4.6x	6/15
Spark (JVM)	208 µs	4.8x	15/15
pg_query	217 µs	5.0x	15/15
Calcite (JVM)	221 µs	5.1x	6/15
node-sql-parser	784 µs	17.9x	15/15
JSqlParser (native)	974 µs	22.3x	15/15
SQLGlot	999 µs	22.9x	15/15
JSqlParser (JVM)	1.4 ms	31.1x	15/15
ShardingSphere (JVM)	3.6 ms	83.4x	9/15
sqlparse	5.6 ms	127.8x	15/15

Why it's faster than Rust and C

A JVM parser beating Rust and C parsers sounds surprising, but it's an architectural consequence, not a benchmark trick.

Generated, not interpreted. Most SQL parsers are either hand-written recursive descent (sqlparser-rs, pg_query, jOOQ) or ANTLR-generated with a generic runtime. Datoria generates specialized Java code from a grammar IR: concrete methods for each SQL construct, no runtime grammar interpretation. The JIT inlines, devirtualizes, and optimizes this code the same way it would for hand-written Java.

Zero backtracking. This is the key advantage over hand-written parsers. A hand-written parser often needs speculative parsing: "try this as a subquery; if it fails, back up and try it as an expression." Datoria's generated lookahead logic picks the correct production at each decision point. No wasted work. Rust parsers like sqlparser-rs backtrack frequently on complex SQL, which is how a JVM parser ends up outperforming them despite the language overhead.

Linear time. Parsing scales linearly with input size. No pathological cases — every token is consumed exactly once.

JIT-friendly patterns. The generated code is structured to suit the JIT: small methods, predictable branches, minimal allocation, cache-friendly access.

What "full-featured" means

Speed comparisons only matter when the parsers do equivalent work. Datoria's parser:

Produces a fully-typed, lossless AST (not just a parse tree).
Preserves all whitespace, comments, and keyword spelling.
Handles 15 SQL dialects from the same codebase.
Includes error recovery for broken SQL.
Supports the full range of SQL syntax (DDL, DML, DQL, DCL).

Simpler parsers that skip whitespace, produce untyped trees, or cover a subset of SQL may be faster, but they're solving a different problem.

Benchmark results​

Why it's faster than Rust and C​

What "full-featured" means​

Benchmark results

Why it's faster than Rust and C

What "full-featured" means