Skip to main content

Type Inference

The type inference engine assigns a data type to every expression in a SQL query: column references, literals, function calls, CASE, CAST, aggregates, window functions, and everything they nest inside.

Nullable-aware type system

Every inferred type carries nullability:

TypeMeaning
Non-nullableKnown data type, guaranteed non-null
NullableKnown data type, may be null
NullThe NULL literal
UnknownType could not be resolved (with diagnostic reason)

This four-state model propagates through expressions. COALESCE(nullable_col, 0) is non-nullable, because the 0 argument is non-nullable, so the output cannot be null. A LEFT JOIN column becomes Nullable even if the source column is non-nullable. A CASE with no ELSE clause is Nullable.

Data types

The type system includes 100+ data types covering all 15 dialects:

  • Numeric: TinyInt, SmallInt, Integer, BigInt, Int128, Decimal, BigDecimal, Float32, Float64
  • String: Char, NChar, VarChar, NVarChar, Text
  • Temporal: Date, Time, DateTime, Timestamp, Timestamptz, Timetz, Interval
  • Boolean: Boolean
  • Binary: Binary, VarBinary, Blob
  • Complex: Array, Struct, Map (with recursive element types)
  • Dialect-specific: JSON, JSONB (PostgreSQL), SUPER (Redshift), GEOGRAPHY (BigQuery), VARIANT (Snowflake)

Function signatures

134+ built-in ANSI function signatures are registered, with dialect-aware extensions. Each signature defines how input types map to output types:

  • Fixed return: COUNT(*) always returns Integer.
  • Passthrough: ABS(x) returns the same type as its argument.
  • Coercion: COALESCE(a, b) returns the least upper bound of all argument types.
  • Math-aware: dividing two integers produces a floating-point result.
  • Conditional: IF(cond, then, else) coerces the then/else branches.
  • Constructor: ARRAY[1, 'a'] infers the array element type from its contents.
  • Aggregate: SUM(integer_col) widens to BigInt; SUM(decimal_col) preserves Decimal.

Type coercion

When types need to combine (UNION columns, CASE branches, COALESCE arguments), the engine applies coercion hierarchies:

  • Numeric rank: TinyInt < SmallInt < Integer < BigInt < Int128 < Decimal < BigDecimal < Float32 < Float64
  • Temporal: Date widens to DateTime widens to Timestamp widens to Timestamptz
  • String: Text > VarChar > Char > NVarChar > NChar
  • Parameterized: Decimal precision and scale, VarChar length, and Timestamp precision are merged to the wider variant
  • Complex: Array and Struct element types are coerced recursively

What it powers

Type inference isn't just informational. It enables:

  • Lineage accuracy. Knowing that SUM(a) produces BigInt vs Decimal affects downstream type propagation.
  • Optimization safety. The optimizer uses types to decide whether a simplification preserves semantics (for example, x = NULL can only be simplified if nullability is known).
  • Schema validation. Check that UNION branches, INSERT values, and CASE branches have compatible types.
  • IDE features. Type-aware auto-complete and inline type hints.