Skip to main content

dbt Support

Datoria reads dbt projects natively -- loading project structure, resolving the model DAG, compiling Jinja templates to SQL, and running the full analysis pipeline (parsing, optimization, lineage, type inference) across every model. Tested against 59 public dbt projects with 9,925 models across all major adapters.

What Works

Point Datoria at a dbt project and it:

  1. Loads the project -- reads dbt_project.yml, profiles.yml, source definitions, and installed packages
  2. Resolves the DAG -- tracks every ref() and source() call to determine model ordering
  3. Compiles Jinja -- evaluates templates to pure SQL, including macros, conditionals, loops, and adapter dispatch
  4. Parses the SQL -- using the correct dialect parser (BigQuery, Snowflake, PostgreSQL, etc.)
  5. Runs the semantic stack -- scope resolution, column qualification, star expansion, type inference, column lineage

All at ~0.3 ms per model. A 10,000-model project analyzes in under 3 seconds. This is interactive speed, not batch.

Cross-Model Lineage

When model B selects from {{ ref('model_a') }}, Datoria traces lineage all the way back through model A to the source tables. This works across arbitrary model depth -- staging → intermediate → mart -- giving you true column-level provenance across the entire project.

Across the 59-project test suite: 134,295 output columns traced, 92.4% fully resolved to source origins.

Example

-- models/customer_orders.sql
{{ config(materialized='table') }}

WITH orders AS (
SELECT * FROM {{ ref('stg_orders') }}
)

SELECT
c.customer_id,
c.name,
COUNT(o.order_id) AS order_count,
SUM(o.amount) AS total_spent
FROM {{ ref('stg_customers') }} c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name

Datoria compiles the Jinja, parses the SQL, expands SELECT * in the CTE, qualifies all columns, and traces each output: order_count comes from stg_orders.order_id, total_spent comes from stg_orders.amount, through the CTE and the JOIN.

dbt Functions

FunctionSupport
ref(model_name)Resolves model references, tracks DAG dependencies
source(source_name, table_name)Resolves source definitions from YAML
config(key=value)Model configuration (materialization, schema, alias, tags)
var(name, default)Project variables from dbt_project.yml
env_var(name, default)Environment variable lookup
Adapter dispatchAdapter-specific macro implementations (BigQuery vs PostgreSQL variants)

Macro System

dbt macros are discovered and registered automatically:

  • Project macros from the project's macros/ directory
  • Package macros from installed packages in dbt_packages/
  • Namespace resolution -- project macros take precedence over packages; packages resolve in declaration order
  • Adapter dispatch -- looks up adapter-specific macro variants before falling back to the default
  • Eager parsing, lazy evaluation -- macros are parsed on discovery but only evaluated when called

15 Adapter Types

Each dbt adapter maps to a SQL dialect with appropriate type mappings and quoting rules:

BigQuery, PostgreSQL, Redshift, Snowflake, DuckDB, Spark, Databricks, T-SQL, Oracle, DB2, MariaDB, Trino, Presto, ANSI, SQLite.

Jinja Template Language

The Jinja compiler is generated from a grammar definition (the same approach used for SQL), producing a parser and evaluator that handles the complete Jinja2 template language:

  • Control flow -- if/elif/else, for loops with filters and recursion, break/continue
  • Macros -- definitions with parameters and defaults, call blocks for higher-order macros
  • Template composition -- extends (inheritance), import/from...import, include, block
  • Expressions -- arithmetic, comparison, logical operators, list/dict literals, filter pipes, dot access, subscript
  • Scoping -- set (assignment, tuple unpacking, namespace dot notation, set blocks), with blocks
  • Output control -- raw blocks (literal output), autoescape, whitespace trim markers ({%-, -%})

Test Coverage

The pipeline is tested against 59 public dbt projects containing 9,925 models across all major adapters. This includes projects from the dbt community, official examples, and production-style repositories. Tests verify:

  • Jinja compilation success (zero crashes across 9,925 models)
  • SQL parse correctness
  • SELECT * expansion
  • Per-column lineage resolution (134,295 columns, 92.4% resolved)
  • Per-column type inference

Limitations

Datoria analyzes dbt projects statically -- it compiles Jinja and analyzes SQL without executing queries against a database.

  • No runtime query execution -- all analysis is compile-time only. Schema information comes from source definitions and upstream models, not from the database.
  • is_incremental() always returns false -- there is no runtime state to determine whether a model is running incrementally. Queries are analyzed as full-refresh.
  • var() values must be provided -- project variables must be set in dbt_project.yml or passed explicitly. There is no interactive prompt or database connection to resolve them at runtime.