Skip to main content

Error Recovery

Most SQL parsers treat a syntax error as total failure: one unexpected token and you get nothing back. That's a problem for any tool that handles SQL as it's being written (IDEs, code agents, autocomplete) or runs over messy real-world SQL (migration analysis, batch linting).

Datoria's parser keeps going. Broken, incomplete, or malformed input produces a partial AST with precise error locations instead of an exception. The parts of the query that did parse retain full structure, typing, and lineage.

What you get

  • Partial ASTs. Correctly-parsed portions are normal typed AST nodes; only the broken portions become error nodes.
  • Precise error positions. Each error carries an exact source position or span, not just a line number.
  • Multiple errors per statement. The parser recovers and keeps going, surfacing as many errors as it can in a single pass.
  • All tokens preserved. Tokens inside error spans stay in the AST, so roundtripping still works.
  • Error messages. Each error node carries a diagnostic describing what went wrong.

Try it in the playground. The ANSI playground includes an error recovery example you can edit.

Example

Given this broken SQL with a typo in the WHERE clause:

SELECT customer_id, name, email
FROM customers
WHERE status = 'active'
ANND created_at > '2024-01-01'
ORDER BY name

The parser produces a full AST for the SELECT, FROM, and ORDER BY clauses. The WHERE clause contains a partial AST with the valid status = 'active' condition, plus an error node spanning ANND created_at > '2024-01-01' with a precise position pointing at the unexpected token ANND.

Most other parsers reject the whole statement and return nothing, losing the structure of the valid parts of the query.

No overhead on valid SQL

Error recovery adds no cost when the SQL parses cleanly. The recovery paths only fire when the parser hits an unexpected token. On the happy path, parsing runs at full speed: the same 56 microseconds per file the benchmarks measure.

Grammar-driven

Error recovery falls out of the grammar definition. It's not hand-coded per dialect, which means:

  • All 15 dialects get error recovery automatically.
  • New grammar additions inherit recovery for free.
  • Recovery behavior is consistent across dialects.

Use cases

IDE support

Parse broken SQL as the user types. The partial AST provides enough structure for:

  • Syntax error highlighting at precise positions
  • Auto-complete that uses the correctly-parsed context
  • Go-to-definition and find-references for identifiers in well-formed parts of the query
  • Lineage and type information on the valid portions

Batch processing

When you're running over a large corpus (migration analysis, linting an entire repo), some files will contain syntax errors. Error recovery lets you:

  • Parse and analyze the valid portions of each file
  • Collect and report all errors without aborting the pipeline
  • Produce partial lineage and type information for imperfect queries

Lineage from incomplete SQL

A syntax error in one clause doesn't have to kill lineage for the rest of the query. The lineage engine traces columns through the valid portions and reports what it can, even if the result is incomplete.