Type Inference
The type inference engine assigns a data type to every expression in a SQL query: column references, literals, function calls, CASE, CAST, aggregates, window functions, and everything they nest inside.
Nullable-aware type system
Every inferred type carries nullability:
| Type | Meaning |
|---|---|
| Non-nullable | Known data type, guaranteed non-null |
| Nullable | Known data type, may be null |
| Null | The NULL literal |
| Unknown | Type could not be resolved (with diagnostic reason) |
This four-state model propagates through expressions. COALESCE(nullable_col, 0) is non-nullable, because the 0 argument is non-nullable, so the output cannot be null. A LEFT JOIN column becomes Nullable even if the source column is non-nullable. A CASE with no ELSE clause is Nullable.
Data types
The type system includes 100+ data types covering all 15 dialects:
- Numeric: TinyInt, SmallInt, Integer, BigInt, Int128, Decimal, BigDecimal, Float32, Float64
- String: Char, NChar, VarChar, NVarChar, Text
- Temporal: Date, Time, DateTime, Timestamp, Timestamptz, Timetz, Interval
- Boolean: Boolean
- Binary: Binary, VarBinary, Blob
- Complex: Array, Struct, Map (with recursive element types)
- Dialect-specific: JSON, JSONB (PostgreSQL), SUPER (Redshift), GEOGRAPHY (BigQuery), VARIANT (Snowflake)
Function signatures
134+ built-in ANSI function signatures are registered, with dialect-aware extensions. Each signature defines how input types map to output types:
- Fixed return:
COUNT(*)always returns Integer. - Passthrough:
ABS(x)returns the same type as its argument. - Coercion:
COALESCE(a, b)returns the least upper bound of all argument types. - Math-aware: dividing two integers produces a floating-point result.
- Conditional:
IF(cond, then, else)coerces the then/else branches. - Constructor:
ARRAY[1, 'a']infers the array element type from its contents. - Aggregate:
SUM(integer_col)widens to BigInt;SUM(decimal_col)preserves Decimal.
Type coercion
When types need to combine (UNION columns, CASE branches, COALESCE arguments), the engine applies coercion hierarchies:
- Numeric rank: TinyInt < SmallInt < Integer < BigInt < Int128 < Decimal < BigDecimal < Float32 < Float64
- Temporal: Date widens to DateTime widens to Timestamp widens to Timestamptz
- String: Text > VarChar > Char > NVarChar > NChar
- Parameterized: Decimal precision and scale, VarChar length, and Timestamp precision are merged to the wider variant
- Complex: Array and Struct element types are coerced recursively
What it powers
Type inference isn't just informational. It enables:
- Lineage accuracy. Knowing that
SUM(a)produces BigInt vs Decimal affects downstream type propagation. - Optimization safety. The optimizer uses types to decide whether a simplification preserves semantics (for example,
x = NULLcan only be simplified if nullability is known). - Schema validation. Check that UNION branches, INSERT values, and CASE branches have compatible types.
- IDE features. Type-aware auto-complete and inline type hints.