Built for LLMs and code agents

LLMs generate SQL. Code agents modify SQL. Both need fast, structured feedback to know whether what they produced is correct. Datoria gives them a tight enough loop to turn "generate and hope" into "generate and verify."

The problem

When an LLM emits a SQL query, you usually have two options: run it against a database (slow, expensive, requires credentials) or trust the output (risky). There's nothing in the middle that checks syntax, types, and lineage without execution.

Code agents face the same problem at higher stakes. An agent modifying a dbt project has to understand the impact of its changes across the model DAG. Without structural understanding, every edit is a guess.

What Datoria provides

Instant structural validation

Parse generated SQL in 56 microseconds and know:

Is the SQL syntactically valid for the target dialect?
If not, where exactly are the errors? (Precise positions, not just "syntax error".)
Does it use constructs that exist in the target dialect? (No BigQuery STRUCT in a PostgreSQL query.)

With error recovery, even partially broken SQL yields a useful partial AST. The agent can see which parts are valid and which still need fixing.

Type checking without a database

Type inference resolves the data type of every expression, column, and function call — no warehouse connection required:

Does SUM(string_column) make sense? The type system catches it.
Does a UNION have compatible branch types? Verified statically.
Does an INSERT's value list match the target column types? Checked.

That gives LLMs and agents compile-time confidence about generated SQL.

Column lineage as context

An agent modifying a dbt model needs to understand what it's affecting. Column lineage answers:

"What does this model depend on?" Every source column, through every CTE and JOIN.
"What breaks if I change this column?" Reverse lineage across the model DAG.
"What does SELECT * actually expand to?" Concrete column lists with types.

That's exactly the context an agent needs to make safe, targeted edits. Instead of feeding the agent an entire project, you can hand it precise dependency information: "this column comes from stg_orders.amount through a SUM aggregation and is consumed by 3 downstream models."

Scope resolution for precise context

The optimizer's scope resolution qualifies every column reference to its source table. For an LLM writing SQL, that means you can:

Validate that referenced columns actually exist in the referenced tables
Provide accurate autocomplete suggestions based on what's in scope
Detect ambiguous column references before they become runtime errors

Formatting for consistent output

LLM-generated SQL tends to be inconsistently formatted: ragged indentation, missing newlines, mixed casing. The formatter normalizes it in one pass, making generated SQL readable and review-friendly.

The LSP connection

These capabilities map onto what a Language Server Protocol (LSP) implementation needs:

LSP Feature	Datoria Capability
Diagnostics (red squiggles)	Error recovery with precise positions
Hover (type info)	Type inference for every expression
Go to definition	Scope resolution and column qualification
Completion	Scope graph knows what's available at any position
Code actions (quick fixes)	AST transformation with lossless roundtripping
Formatting	AST-aware formatter with 27 options
References (find usages)	Column lineage traces all consumers

Whether the client is a human in an IDE or an LLM agent, the underlying capabilities are the same. An agent just consumes them programmatically, at parser speed (~0.3 ms per model for the full analysis pipeline).

Performance for agent loops

Agent workflows are iterative: generate → validate → fix → validate → fix. Each iteration needs to be fast enough to not bottleneck the loop.

Parse: 56 microseconds per file
Full analysis (parse + optimize + lineage + types): ~0.3 ms per model
10,000 models: under 3 seconds for the whole project

So an agent can analyze the impact of a proposed change across an entire dbt project in seconds, not minutes — tight enough for an interactive workflow.

The problem​

What Datoria provides​

Instant structural validation​

Type checking without a database​

Column lineage as context​

Scope resolution for precise context​

Formatting for consistent output​

The LSP connection​

Performance for agent loops​