dbt Support
Datoria reads dbt projects natively -- loading project structure, resolving the model DAG, compiling Jinja templates to SQL, and running the full analysis pipeline (parsing, optimization, lineage, type inference) across every model. Tested against 59 public dbt projects with 9,925 models across all major adapters.
What Works
Point Datoria at a dbt project and it:
- Loads the project -- reads
dbt_project.yml,profiles.yml, source definitions, and installed packages - Resolves the DAG -- tracks every
ref()andsource()call to determine model ordering - Compiles Jinja -- evaluates templates to pure SQL, including macros, conditionals, loops, and adapter dispatch
- Parses the SQL -- using the correct dialect parser (BigQuery, Snowflake, PostgreSQL, etc.)
- Runs the semantic stack -- scope resolution, column qualification, star expansion, type inference, column lineage
All at ~0.3 ms per model. A 10,000-model project analyzes in under 3 seconds. This is interactive speed, not batch.
Cross-Model Lineage
When model B selects from {{ ref('model_a') }}, Datoria traces lineage all the way back through model A to the source tables. This works across arbitrary model depth -- staging → intermediate → mart -- giving you true column-level provenance across the entire project.
Across the 59-project test suite: 134,295 output columns traced, 92.4% fully resolved to source origins.
Example
-- models/customer_orders.sql
{{ config(materialized='table') }}
WITH orders AS (
SELECT * FROM {{ ref('stg_orders') }}
)
SELECT
c.customer_id,
c.name,
COUNT(o.order_id) AS order_count,
SUM(o.amount) AS total_spent
FROM {{ ref('stg_customers') }} c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name
Datoria compiles the Jinja, parses the SQL, expands SELECT * in the CTE, qualifies all columns, and traces each output: order_count comes from stg_orders.order_id, total_spent comes from stg_orders.amount, through the CTE and the JOIN.
dbt Functions
| Function | Support |
|---|---|
ref(model_name) | Resolves model references, tracks DAG dependencies |
source(source_name, table_name) | Resolves source definitions from YAML |
config(key=value) | Model configuration (materialization, schema, alias, tags) |
var(name, default) | Project variables from dbt_project.yml |
env_var(name, default) | Environment variable lookup |
| Adapter dispatch | Adapter-specific macro implementations (BigQuery vs PostgreSQL variants) |
Macro System
dbt macros are discovered and registered automatically:
- Project macros from the project's
macros/directory - Package macros from installed packages in
dbt_packages/ - Namespace resolution -- project macros take precedence over packages; packages resolve in declaration order
- Adapter dispatch -- looks up adapter-specific macro variants before falling back to the default
- Eager parsing, lazy evaluation -- macros are parsed on discovery but only evaluated when called
15 Adapter Types
Each dbt adapter maps to a SQL dialect with appropriate type mappings and quoting rules:
BigQuery, PostgreSQL, Redshift, Snowflake, DuckDB, Spark, Databricks, T-SQL, Oracle, DB2, MariaDB, Trino, Presto, ANSI, SQLite.
Jinja Template Language
The Jinja compiler is generated from a grammar definition (the same approach used for SQL), producing a parser and evaluator that handles the complete Jinja2 template language:
- Control flow --
if/elif/else,forloops with filters and recursion,break/continue - Macros -- definitions with parameters and defaults,
callblocks for higher-order macros - Template composition --
extends(inheritance),import/from...import,include,block - Expressions -- arithmetic, comparison, logical operators, list/dict literals, filter pipes, dot access, subscript
- Scoping --
set(assignment, tuple unpacking, namespace dot notation, set blocks),withblocks - Output control --
rawblocks (literal output),autoescape, whitespace trim markers ({%-,-%})
Test Coverage
The pipeline is tested against 59 public dbt projects containing 9,925 models across all major adapters. This includes projects from the dbt community, official examples, and production-style repositories. Tests verify:
- Jinja compilation success (zero crashes across 9,925 models)
- SQL parse correctness
SELECT *expansion- Per-column lineage resolution (134,295 columns, 92.4% resolved)
- Per-column type inference
Limitations
Datoria analyzes dbt projects statically -- it compiles Jinja and analyzes SQL without executing queries against a database.
- No runtime query execution -- all analysis is compile-time only. Schema information comes from source definitions and upstream models, not from the database.
is_incremental()always returnsfalse-- there is no runtime state to determine whether a model is running incrementally. Queries are analyzed as full-refresh.var()values must be provided -- project variables must be set indbt_project.ymlor passed explicitly. There is no interactive prompt or database connection to resolve them at runtime.