Performance
Datoria generates the fastest full-featured SQL parser we've been able to benchmark — outperforming parsers written in Rust, C, and every other JVM-based parser tested.
Benchmark Results
Measured over 1000 iterations parsing 15 query-only SQL files on Apple M1 Max. Warmup iterations excluded. Full results cover 21 parsers.Full benchmark table
| Parser | Per-File | vs Datoria | Files Parsed |
|---|---|---|---|
| Druid (JVM) | 27 µs | 0.6x | 12/15 |
| Druid (native) | 27 µs | 0.6x | 15/15 |
| Datoria (JVM) | 44 µs | 1.0x | 15/15 |
| Datoria (native) | 48 µs | 1.0x | 15/15 |
| jOOQ (JVM) | 103 µs | 2.4x | 15/15 |
| sqlparser-rs | 114 µs | 2.6x | 15/15 |
| Polyglot | 115 µs | 2.6x | 15/15 |
| Calcite (native) | 146 µs | 3.3x | 15/15 |
| jOOQ (native) | 164 µs | 3.8x | 15/15 |
| sqloxide | 168 µs | 3.8x | 15/15 |
| Trino (JVM) | 175 µs | 4.0x | 14/15 |
| Flink SQL (JVM) | 200 µs | 4.6x | 6/15 |
| Spark (JVM) | 208 µs | 4.8x | 15/15 |
| pg_query | 217 µs | 5.0x | 15/15 |
| Calcite (JVM) | 221 µs | 5.1x | 6/15 |
| node-sql-parser | 784 µs | 17.9x | 15/15 |
| JSqlParser (native) | 974 µs | 22.3x | 15/15 |
| SQLGlot | 999 µs | 22.9x | 15/15 |
| JSqlParser (JVM) | 1.4 ms | 31.1x | 15/15 |
| ShardingSphere (JVM) | 3.6 ms | 83.4x | 9/15 |
| sqlparse | 5.6 ms | 127.8x | 15/15 |
Why It's Faster Than Rust and C
A JVM parser beating Rust and C parsers sounds surprising, but it's an architectural consequence, not a benchmark trick.
Generated, not interpreted. Most SQL parsers are either hand-written recursive descent (sqlparser-rs, pg_query, jOOQ) or ANTLR-generated with a generic runtime. Datoria generates specialized Java code from a grammar IR — concrete methods for each SQL construct, with no runtime grammar interpretation. The JIT compiler inlines, devirtualizes, and optimizes this code the same way it would hand-written Java.
Zero backtracking. This is the key advantage over hand-written parsers. A hand-written parser often needs speculative parsing: "try parsing this as a subquery, and if that fails, backtrack and try it as an expression." Datoria's generated lookahead logic deterministically selects the correct production at each decision point. No wasted work, ever. Rust parsers like sqlparser-rs backtrack frequently on complex SQL — which is why a JVM parser can outperform them despite the language overhead.
Linear time. Parsing scales linearly with input size. There are no pathological cases where complex SQL causes exponential blowup — every token is consumed exactly once.
JIT-friendly patterns. The generated code is structured to play well with the JVM's JIT compiler: small methods, predictable branches, minimal allocation, and cache-friendly access patterns.
What "Full-Featured" Means
Speed comparisons only matter when comparing equivalent functionality. Datoria's parser:
- Produces a fully-typed, lossless AST (not just a parse tree)
- Preserves all whitespace, comments, and keyword spelling
- Handles 15 SQL dialects from the same codebase
- Includes error recovery for broken SQL
- Supports the full range of SQL syntax (DDL, DML, DQL, DCL)
Simpler parsers that skip whitespace, produce untyped trees, or only handle a subset of SQL may be faster — but they're solving a different problem.