dbt support

Datoria reads dbt projects natively. It loads project structure, resolves the model DAG, compiles Jinja templates to SQL, and runs the full analysis pipeline (parsing, optimization, lineage, type inference) over every model. The pipeline is tested against 59 public dbt projects with 9,925 models across all major adapters.

What works

Point Datoria at a dbt project and it:

Loads the project. Reads dbt_project.yml, profiles.yml, source definitions, and installed packages.
Resolves the DAG. Tracks every ref() and source() call to determine model ordering.
Compiles Jinja. Evaluates templates to pure SQL, including macros, conditionals, loops, and adapter dispatch.
Parses the SQL using the correct dialect parser (BigQuery, Snowflake, PostgreSQL, and so on).
Runs the semantic stack: scope resolution, column qualification, star expansion, type inference, column lineage.

All at ~0.3 ms per model. A 10,000-model project analyzes in under 3 seconds. Interactive speed, not batch.

Cross-model lineage

When model B selects from {{ ref('model_a') }}, lineage traces all the way back through model A to the source tables. Arbitrary model depth works — staging → intermediate → mart — so you get column-level provenance across the project.

Across the 59-project test suite: 134,295 output columns traced, 92.4% fully resolved to source origins.

Example

-- models/customer_orders.sql
{{ config(materialized='table') }}

WITH orders AS (
  SELECT * FROM {{ ref('stg_orders') }}
)

SELECT
  c.customer_id,
  c.name,
  COUNT(o.order_id) AS order_count,
  SUM(o.amount) AS total_spent
FROM {{ ref('stg_customers') }} c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name

Datoria compiles the Jinja, parses the SQL, expands SELECT * in the CTE, qualifies every column, and traces each output: order_count comes from stg_orders.order_id, total_spent comes from stg_orders.amount, through the CTE and the JOIN.

dbt functions

Function	Support
`ref(model_name)`	Resolves model references, tracks DAG dependencies
`source(source_name, table_name)`	Resolves source definitions from YAML
`config(key=value)`	Model configuration (materialization, schema, alias, tags)
`var(name, default)`	Project variables from `dbt_project.yml`
`env_var(name, default)`	Environment variable lookup
Adapter dispatch	Adapter-specific macro implementations (BigQuery vs PostgreSQL variants)

Macro system

dbt macros are discovered and registered automatically:

Project macros from the project's macros/ directory
Package macros from installed packages in dbt_packages/
Namespace resolution. Project macros take precedence over packages; packages resolve in declaration order.
Adapter dispatch. Looks up adapter-specific macro variants before falling back to the default.
Eager parsing, lazy evaluation. Macros are parsed on discovery but only evaluated when called.

15 adapter types

Each dbt adapter maps to a SQL dialect with appropriate type mappings and quoting rules:

BigQuery, PostgreSQL, Redshift, Snowflake, DuckDB, Spark, Databricks, T-SQL, Oracle, DB2, MariaDB, Trino, Presto, ANSI, SQLite.

Jinja template language

The Jinja compiler is generated from a grammar definition (same approach as the SQL parsers). It handles the full Jinja2 template language:

Control flow: if/elif/else, for loops with filters and recursion, break/continue
Macros: definitions with parameters and defaults, call blocks for higher-order macros
Template composition: extends (inheritance), import/from...import, include, block
Expressions: arithmetic, comparison, logical operators, list/dict literals, filter pipes, dot access, subscript
Scoping: set (assignment, tuple unpacking, namespace dot notation, set blocks), with blocks
Output control: raw blocks (literal output), autoescape, whitespace trim markers ({%-, -%})

Test coverage

The pipeline is tested against 59 public dbt projects containing 9,925 models across all major adapters: dbt community projects, official examples, and production-style repositories. Tests verify:

Jinja compilation success (zero crashes across 9,925 models)
SQL parse correctness
SELECT * expansion
Per-column lineage resolution (134,295 columns, 92.4% resolved)
Per-column type inference

Limitations

Datoria analyzes dbt projects statically. It compiles Jinja and analyzes SQL without executing queries against a database.

No runtime query execution. Analysis is compile-time only. Schema information comes from source definitions and upstream models, not from the database.
is_incremental() always returns false. There is no runtime state to determine whether a model is running incrementally. Queries are analyzed as full-refresh.
var() values must be provided. Project variables must be set in dbt_project.yml or passed explicitly. There is no interactive prompt or database connection to resolve them at runtime.

What works​

Cross-model lineage​

Example​

dbt functions​

Macro system​

15 adapter types​

Jinja template language​

Test coverage​

Limitations​