Performance
Datoria generates the fastest full-featured SQL parser we've been able to benchmark. It outperforms the parsers written in Rust and C, and every other JVM-based parser we've tested.
Benchmark results
Measured over 1000 iterations parsing 15 query-only SQL files on Apple M1 Max. Warmup iterations excluded. Full results cover 21 parsers.Full benchmark table
| Parser | Per-File | vs Datoria | Files Parsed |
|---|---|---|---|
| Druid (JVM) | 27 µs | 0.6x | 12/15 |
| Druid (native) | 27 µs | 0.6x | 15/15 |
| Datoria (JVM) | 44 µs | 1.0x | 15/15 |
| Datoria (native) | 48 µs | 1.0x | 15/15 |
| jOOQ (JVM) | 103 µs | 2.4x | 15/15 |
| sqlparser-rs | 114 µs | 2.6x | 15/15 |
| Polyglot | 115 µs | 2.6x | 15/15 |
| Calcite (native) | 146 µs | 3.3x | 15/15 |
| jOOQ (native) | 164 µs | 3.8x | 15/15 |
| sqloxide | 168 µs | 3.8x | 15/15 |
| Trino (JVM) | 175 µs | 4.0x | 14/15 |
| Flink SQL (JVM) | 200 µs | 4.6x | 6/15 |
| Spark (JVM) | 208 µs | 4.8x | 15/15 |
| pg_query | 217 µs | 5.0x | 15/15 |
| Calcite (JVM) | 221 µs | 5.1x | 6/15 |
| node-sql-parser | 784 µs | 17.9x | 15/15 |
| JSqlParser (native) | 974 µs | 22.3x | 15/15 |
| SQLGlot | 999 µs | 22.9x | 15/15 |
| JSqlParser (JVM) | 1.4 ms | 31.1x | 15/15 |
| ShardingSphere (JVM) | 3.6 ms | 83.4x | 9/15 |
| sqlparse | 5.6 ms | 127.8x | 15/15 |
Why it's faster than Rust and C
A JVM parser beating Rust and C parsers sounds surprising, but it's an architectural consequence, not a benchmark trick.
Generated, not interpreted. Most SQL parsers are either hand-written recursive descent (sqlparser-rs, pg_query, jOOQ) or ANTLR-generated with a generic runtime. Datoria generates specialized Java code from a grammar IR: concrete methods for each SQL construct, no runtime grammar interpretation. The JIT inlines, devirtualizes, and optimizes this code the same way it would for hand-written Java.
Zero backtracking. This is the key advantage over hand-written parsers. A hand-written parser often needs speculative parsing: "try this as a subquery; if it fails, back up and try it as an expression." Datoria's generated lookahead logic picks the correct production at each decision point. No wasted work. Rust parsers like sqlparser-rs backtrack frequently on complex SQL, which is how a JVM parser ends up outperforming them despite the language overhead.
Linear time. Parsing scales linearly with input size. No pathological cases — every token is consumed exactly once.
JIT-friendly patterns. The generated code is structured to suit the JIT: small methods, predictable branches, minimal allocation, cache-friendly access.
What "full-featured" means
Speed comparisons only matter when the parsers do equivalent work. Datoria's parser:
- Produces a fully-typed, lossless AST (not just a parse tree).
- Preserves all whitespace, comments, and keyword spelling.
- Handles 15 SQL dialects from the same codebase.
- Includes error recovery for broken SQL.
- Supports the full range of SQL syntax (DDL, DML, DQL, DCL).
Simpler parsers that skip whitespace, produce untyped trees, or cover a subset of SQL may be faster, but they're solving a different problem.