Skip to main content

Type Inference

The type inference engine determines the data type of every expression in a SQL query -- from simple column references and literals to complex nested expressions involving functions, CASE, CAST, aggregates, and window functions.

Nullable-Aware Type System

Every inferred type carries nullability information:

TypeMeaning
Non-nullableKnown data type, guaranteed non-null
NullableKnown data type, may be null
NullThe NULL literal
UnknownType could not be resolved (with diagnostic reason)

This four-state model propagates through all expressions. A COALESCE(nullable_col, 0) produces a non-nullable result -- because at least one argument (0) is non-nullable, the output is guaranteed non-null. A LEFT JOIN column becomes Nullable even if the source column is non-nullable. A CASE with no ELSE clause is Nullable.

Data Types

The type system includes 100+ data types covering all 15 dialects:

  • Numeric -- TinyInt, SmallInt, Integer, BigInt, Int128, Decimal, BigDecimal, Float32, Float64
  • String -- Char, NChar, VarChar, NVarChar, Text
  • Temporal -- Date, Time, DateTime, Timestamp, Timestamptz, Timetz, Interval
  • Boolean -- Boolean
  • Binary -- Binary, VarBinary, Blob
  • Complex -- Array, Struct, Map (with recursive element types)
  • Dialect-specific -- JSON, JSONB (PostgreSQL), SUPER (Redshift), GEOGRAPHY (BigQuery), VARIANT (Snowflake)

Function Signatures

134+ built-in ANSI function signatures are registered, with dialect-aware extensions. Each function signature defines how input types map to output types:

  • Fixed return -- COUNT(*) always returns Integer
  • Passthrough -- ABS(x) returns the same type as its argument
  • Coercion -- COALESCE(a, b) returns the least upper bound of all argument types
  • Math-aware -- division of two integers produces a floating-point result
  • Conditional -- IF(cond, then, else) coerces the then/else branches
  • Constructor -- ARRAY[1, 'a'] infers the array element type from its contents
  • Aggregate -- SUM(integer_col) widens to BigInt; SUM(decimal_col) preserves Decimal

Type Coercion

When types need to be combined (UNION columns, CASE branches, COALESCE arguments), the engine applies coercion hierarchies:

  • Numeric rank -- TinyInt < SmallInt < Integer < BigInt < Int128 < Decimal < BigDecimal < Float32 < Float64
  • Temporal -- Date widens to DateTime widens to Timestamp widens to Timestamptz
  • String -- Text > VarChar > Char > NVarChar > NChar
  • Parameterized -- Decimal precision/scale, VarChar length, and Timestamp precision are merged to the wider variant
  • Complex -- Array and Struct element types are coerced recursively

What It Powers

Type inference is not just informational -- it enables:

  • Column lineage accuracy -- knowing that SUM(a) produces BigInt vs Decimal affects downstream type propagation
  • Optimization safety -- the optimizer uses types to determine whether simplifications preserve semantics (e.g., x = NULL can only be simplified if the nullability is known)
  • Schema validation -- verify that UNION branches, INSERT values, and CASE branches have compatible types
  • IDE features -- type-aware auto-complete and inline type hints