Skip to main content

dbt support

Datoria reads dbt projects natively. It loads project structure, resolves the model DAG, compiles Jinja templates to SQL, and runs the full analysis pipeline (parsing, optimization, lineage, type inference) over every model. The pipeline is tested against 59 public dbt projects with 9,925 models across all major adapters.

What works

Point Datoria at a dbt project and it:

  1. Loads the project. Reads dbt_project.yml, profiles.yml, source definitions, and installed packages.
  2. Resolves the DAG. Tracks every ref() and source() call to determine model ordering.
  3. Compiles Jinja. Evaluates templates to pure SQL, including macros, conditionals, loops, and adapter dispatch.
  4. Parses the SQL using the correct dialect parser (BigQuery, Snowflake, PostgreSQL, and so on).
  5. Runs the semantic stack: scope resolution, column qualification, star expansion, type inference, column lineage.

All at ~0.3 ms per model. A 10,000-model project analyzes in under 3 seconds. Interactive speed, not batch.

Cross-model lineage

When model B selects from {{ ref('model_a') }}, lineage traces all the way back through model A to the source tables. Arbitrary model depth works — staging → intermediate → mart — so you get column-level provenance across the project.

Across the 59-project test suite: 134,295 output columns traced, 92.4% fully resolved to source origins.

Example

-- models/customer_orders.sql
{{ config(materialized='table') }}

WITH orders AS (
SELECT * FROM {{ ref('stg_orders') }}
)

SELECT
c.customer_id,
c.name,
COUNT(o.order_id) AS order_count,
SUM(o.amount) AS total_spent
FROM {{ ref('stg_customers') }} c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name

Datoria compiles the Jinja, parses the SQL, expands SELECT * in the CTE, qualifies every column, and traces each output: order_count comes from stg_orders.order_id, total_spent comes from stg_orders.amount, through the CTE and the JOIN.

dbt functions

FunctionSupport
ref(model_name)Resolves model references, tracks DAG dependencies
source(source_name, table_name)Resolves source definitions from YAML
config(key=value)Model configuration (materialization, schema, alias, tags)
var(name, default)Project variables from dbt_project.yml
env_var(name, default)Environment variable lookup
Adapter dispatchAdapter-specific macro implementations (BigQuery vs PostgreSQL variants)

Macro system

dbt macros are discovered and registered automatically:

  • Project macros from the project's macros/ directory
  • Package macros from installed packages in dbt_packages/
  • Namespace resolution. Project macros take precedence over packages; packages resolve in declaration order.
  • Adapter dispatch. Looks up adapter-specific macro variants before falling back to the default.
  • Eager parsing, lazy evaluation. Macros are parsed on discovery but only evaluated when called.

15 adapter types

Each dbt adapter maps to a SQL dialect with appropriate type mappings and quoting rules:

BigQuery, PostgreSQL, Redshift, Snowflake, DuckDB, Spark, Databricks, T-SQL, Oracle, DB2, MariaDB, Trino, Presto, ANSI, SQLite.

Jinja template language

The Jinja compiler is generated from a grammar definition (same approach as the SQL parsers). It handles the full Jinja2 template language:

  • Control flow: if/elif/else, for loops with filters and recursion, break/continue
  • Macros: definitions with parameters and defaults, call blocks for higher-order macros
  • Template composition: extends (inheritance), import/from...import, include, block
  • Expressions: arithmetic, comparison, logical operators, list/dict literals, filter pipes, dot access, subscript
  • Scoping: set (assignment, tuple unpacking, namespace dot notation, set blocks), with blocks
  • Output control: raw blocks (literal output), autoescape, whitespace trim markers ({%-, -%})

Test coverage

The pipeline is tested against 59 public dbt projects containing 9,925 models across all major adapters: dbt community projects, official examples, and production-style repositories. Tests verify:

  • Jinja compilation success (zero crashes across 9,925 models)
  • SQL parse correctness
  • SELECT * expansion
  • Per-column lineage resolution (134,295 columns, 92.4% resolved)
  • Per-column type inference

Limitations

Datoria analyzes dbt projects statically. It compiles Jinja and analyzes SQL without executing queries against a database.

  • No runtime query execution. Analysis is compile-time only. Schema information comes from source definitions and upstream models, not from the database.
  • is_incremental() always returns false. There is no runtime state to determine whether a model is running incrementally. Queries are analyzed as full-refresh.
  • var() values must be provided. Project variables must be set in dbt_project.yml or passed explicitly. There is no interactive prompt or database connection to resolve them at runtime.