Skip to main content

Error Recovery

Most SQL parsers treat a syntax error as total failure -- one unexpected token and you get nothing back. For any tool that needs to work with SQL as it's being written (IDEs, code agents, autocomplete) or with messy real-world SQL (migration analysis, batch linting), this is a dealbreaker.

Datoria's parser handles broken, incomplete, and malformed SQL gracefully -- producing partial ASTs with precise error locations instead of failing outright. The valid 80% of a query retains full structure, typing, and lineage capability.

What You Get

  • Partial ASTs -- the correctly-parsed portions of a query are represented as normal typed AST nodes; only the broken portions become error nodes
  • Precise error positions -- each error carries an exact source position (or span), not just a line number
  • Multiple errors per statement -- the parser recovers and continues, finding as many errors as possible in a single pass
  • All tokens preserved -- even tokens inside error spans are retained in the AST, maintaining lossless representation
  • Error messages -- each error node carries a diagnostic message describing what went wrong

Try it in the playground -- the ANSI playground includes an error recovery example you can edit.

Example

Given this broken SQL with a typo in the WHERE clause:

SELECT customer_id, name, email
FROM customers
WHERE status = 'active'
ANND created_at > '2024-01-01'
ORDER BY name

The parser produces a full AST for the SELECT, FROM, and ORDER BY clauses. The WHERE clause contains a partial AST with the valid status = 'active' condition, plus an error node spanning ANND created_at > '2024-01-01' with a precise position pointing to the unexpected token ANND.

Most other parsers would reject the entire statement and return nothing -- losing the structural information about the valid 80% of the query.

Zero Overhead on Valid SQL

Error recovery adds no performance cost when parsing valid SQL. The recovery mechanisms only activate when the parser encounters unexpected tokens. On the happy path, parsing runs at full speed -- the same 56 microseconds/file that benchmarks measure.

Grammar-Driven

Error recovery is derived from the grammar definition, not hand-coded per dialect. This means:

  • All 15 dialects get error recovery automatically
  • New grammar additions immediately benefit from recovery
  • Recovery behavior is consistent and predictable across dialects

Use Cases

IDE Support

Parse broken SQL as the user types. The partial AST provides enough structure for:

  • Syntax error highlighting at precise positions
  • Auto-completion using the correctly-parsed context
  • Go-to-definition and find-references for identifiers that are in well-formed parts of the query
  • Lineage and type information for the valid portions

Batch Processing

When processing a large corpus of SQL files (e.g., during migration analysis), some files may contain syntax errors. Error recovery lets you:

  • Parse and analyze the valid portions of each file
  • Collect and report all errors without aborting the pipeline
  • Produce partial lineage and type information for imperfect queries

Lineage from Incomplete SQL

Even if a query has a syntax error in one clause, the rest of the query may be structurally complete. Error recovery allows the lineage engine to trace columns through the valid portions, providing useful (if incomplete) lineage information.