MotleyAI · ZmeiGorynych · Jun 16, 2026 · Jun 18, 2026 · Jun 18, 2026 · Jun 18, 2026
diff --git a/docs/database-support.md b/docs/database-support.md
@@ -5,23 +5,35 @@ SQL generation. Databases are supported at two tiers.
 
 ## Tier 1 — fully tested
 
-Integration tests and/or Docker examples; must not regress.
-
-| Engine | Coverage |
-|---|---|
-| **SQLite** | Integration tests in `tests/integration/test_integration.py`; embedded example. |
-| **Postgres** | Integration tests in `tests/integration/test_integration_postgres.py`; Docker example. |
-| **DuckDB** | Integration tests in `tests/integration/test_integration_duckdb.py` (in-process, no Docker). |
-| **MySQL** | Docker example with `verify.py`. |
-| **ClickHouse** | Docker example with `verify.py`. |
-| **SQL Server** | Docker example with `verify.py` in `examples/sqlserver/`. |
-| **Snowflake** | Integration tests in `tests/integration/test_integration_snowflake.py` (skip without `~/.snowflake/connections.toml`); `examples/snowflake/` ships `README.md` + `verify.py`. No Docker (no free local image). |
+Live-instance integration tests must not regress. Where Docker images exist,
+the suites spin up the engine via `testcontainers`; the cloud-only engines
+(BigQuery, Snowflake) skip cleanly when credentials aren't available and run
+against the live service in CI when they are.
+
+| Engine | Live test | Docker example |
+|---|---|---|
+| **SQLite** | `tests/integration/test_integration.py` (in-process) | `examples/embedded/` |
+| **Postgres** | `tests/integration/test_integration_postgres.py` (pytest-postgresql, spawned temp instance) | `examples/postgres/` |
+| **DuckDB** | `tests/integration/test_integration_duckdb.py` (in-process) | `examples/embedded/` (DuckDB mode) |
+| **MySQL** | `tests/integration/test_integration_mysql.py` (`testcontainers[mysql]`) | `examples/mysql/` |
+| **ClickHouse** | `tests/integration/test_integration_clickhouse.py` (`testcontainers[clickhouse]`) | `examples/clickhouse/` |
+| **SQL Server** | `tests/integration/test_integration_sqlserver.py` (`testcontainers`, `msodbcsql18` + `unixodbc-dev` on the runner) | `examples/sqlserver/` |
+| **Snowflake** | `tests/integration/test_integration_snowflake.py` (skips without `~/.snowflake/connections.toml`; profile name overridable via `$SLAYER_SNOWFLAKE_CONNECTION`) | `examples/snowflake/` (no Docker) |
+| **BigQuery** | `examples/bigquery/verify.py` driven by CI against `bigquery-public-data.thelook_ecommerce` (gated on `GCP_PROJECT_ID` / `GCP_SA_KEY_B64` repo secrets) | `examples/bigquery/` (no Docker — managed service) |
+
+BigQuery does not yet have a pytest-style integration suite; its CI coverage
+runs the example's `verify.py` directly via `.github/workflows/ci.yml`. That
+exercises auto-ingestion, basic projection, joins, time-grain dimensions, and
+the cardinality / sum-of-grouped-equals-total invariants — enough to catch
+emitted-SQL regressions, but the verify-script tier is shallower than the
+testcontainers suites.
 
 ## Tier 2 — code-covered
 
 Unit tests for SQL generation; no live-instance verification.
 
-BigQuery, Redshift, Trino/Presto, Databricks/Spark, Oracle.
+Redshift, Trino/Presto (Athena uses the Presto dialect), Databricks/Spark,
+Oracle.
 
 ## Aggregation support
 
@@ -40,6 +52,8 @@ because no standard syntax works everywhere:
 | ClickHouse | yes | yes | yes | yes | Native `median(x)`, parametric `quantile(p)(x)`, native `stddev_*`/`var_*`/`corr`/`covar*` (camelCase variants emitted by sqlglot for `var_samp`). |
 | Snowflake | yes | yes | yes | yes | Native `MEDIAN`, `PERCENTILE_CONT(p) WITHIN GROUP`, `STDDEV_*`/`VAR_*`/`CORR`/`COVAR_*`. `LOG10` native; no native `LOG2` (falls through to `LOG(2, x)`). |
 | MySQL | **no** | **no** | yes | **no** | No native `MEDIAN`/`PERCENTILE_CONT`/`CORR`/`COVAR_*` and no Python-UDF mechanism — SLayer raises `NotImplementedError` for those. `STDDEV_SAMP`/`STDDEV_POP`/`VAR_SAMP`/`VAR_POP` are native on MySQL. Use MariaDB or compute the unsupported aggregations client-side. |
+| SQL Server (T-SQL) | **no** | **no** | yes | yes (decomposed) | `MEDIAN` doesn't exist and T-SQL's `PERCENTILE_CONT` is window-only (no `WITHIN GROUP` aggregate form) — SLayer raises `NotImplementedError`. Native `STDEV`/`STDEVP`/`VAR`/`VARP` (slayer renames the canonical `STDDEV_*`/`VAR_*` names at emit time). `CORR`/`COVAR_*` use the same variance-decomposition formula as MySQL (`cov(x,y) = (var(x+y) − var(x) − var(y)) / 2`, `corr = cov / (stddev(x) · stddev(y))`). |
+| BigQuery | **no** | **no** | yes | yes | BigQuery has no `MEDIAN` aggregate, and its `PERCENTILE_CONT` is analytic-only (no `WITHIN GROUP` syntax) — the base class emit `PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY x)` fails at runtime. If you need percentile on BigQuery, define a custom `Aggregation` using `APPROX_QUANTILES(x, 100)[OFFSET(N)]`. Native `STDDEV_SAMP`/`STDDEV_POP`/`VAR_SAMP`/`VAR_POP`/`CORR`/`COVAR_SAMP`/`COVAR_POP` (sqlglot may emit `VARIANCE` for `var_samp`). |
 
 ### SQLite caveats
 
@@ -105,6 +119,37 @@ If you need percentiles on MySQL, the recommended options are:
 - Define a custom `Aggregation` on the model with whatever `GROUP_CONCAT`-
   based or windowed expression suits your data shape and group sizes.
 
+### SQL Server (T-SQL) caveats
+
+T-SQL has `STDEV`/`STDEVP`/`VAR`/`VARP` (not `STDDEV_SAMP`/`STDDEV_POP`/
+`VAR_SAMP`/`VAR_POP`); sqlglot's tsql transpiler emits incorrect names like
+`VAR_SAMP` and `VARIANCE_POP`, so the T-SQL dialect overrides the canonical
+spellings via `Anonymous` rewrites in `slayer/sql/dialects/tsql.py`.
+
+`CORR`/`COVAR_SAMP`/`COVAR_POP` are derived from variance:
+`cov(x, y) = (var(x + y) − var(x) − var(y)) / 2`,
+`corr = cov / (stddev(x) · stddev(y))`. The decomposition is shared with
+MySQL via `_build_covar_decomposition` in `slayer/sql/dialects/base.py`.
+
+`MEDIAN` doesn't exist, and `PERCENTILE_CONT` in T-SQL is a window function
+only — there is no `WITHIN GROUP` aggregate form. SLayer raises
+`NotImplementedError` for both at SQL generation time. Use the windowed form
+as a custom `Aggregation` if you need it, or compute client-side.
+
+Other T-SQL specifics surfaced by the dialect:
+
+- `DATETRUNC(unit, col)` for time-grain dimensions (SQL Server 2022+ —
+  earlier versions don't have `DATETRUNC` and aren't supported).
+- `DATETRUNC(iso_week, col)` for Monday-aligned week truncation —
+  `@@DATEFIRST`-independent so the bucketing is deterministic.
+- `DATEADD(unit, n, col)` for time-shift arithmetic — T-SQL has no
+  `INTERVAL` literal.
+- Bracketed `[ident]` quoting — `<model>.<column>` SLayer aliases get
+  mangled to `<model>___<column>` at emit and decoded back on result-row
+  keys (mirror of the BigQuery `___` mangling; see DEV-1571).
+- Native `LOG10`, no native `LOG2` (`log2(x)` falls through to the
+  canonical 2-arg `LOG(2, x)` form).
+
 ### Snowflake caveats
 
 Snowflake is a fully managed cloud warehouse — no Docker, no local instance.
@@ -130,6 +175,40 @@ Snowflake](configuration/datasources.md#snowflake) for connection setup.
   canonical 2-arg `LOG(2, x)` form. `LOG10` and the rest of the math /
   statistical functions are native.
 
+### BigQuery caveats
+
+BigQuery is a fully managed cloud warehouse — no Docker, no local instance.
+CI runs the example's `verify.py` against `bigquery-public-data.thelook_ecommerce`,
+gated on `GCP_PROJECT_ID` and `GCP_SA_KEY_B64` repo secrets (forks without
+them skip cleanly). Auth via Google Application Default Credentials
+(`$GOOGLE_APPLICATION_CREDENTIALS` pointing at a service-account JSON key,
+plus `$GCP_PROJECT_ID` for billing). The `bigquery://` driver requires the
+`sqlalchemy-bigquery` extra.
+
+- **No FK introspection.** BigQuery exposes no foreign-key metadata via
+  `INFORMATION_SCHEMA`, so auto-ingestion cannot discover joins. Hand-declare
+  `ModelJoin`s on the model.
+- **Dotted alias mangling.** BigQuery rejects column names containing `.`
+  (output schema names must match `[A-Za-z_][A-Za-z0-9_]*`), so SLayer
+  rewrites `<model>.<column>` aliases (`orders._count`,
+  `orders.products.category`) to `<model>___<column>` at emit time and
+  reverses the mapping on result rows. The triple-underscore separator is
+  distinct from `__` (used by `_query_as_model` for cross-model leaf
+  flattening), so the two encodings never collide. In `Column.sql`,
+  fully-qualified table paths must be backticked per-segment
+  (`` `project`.`dataset`.`table` ``) — a single backticked dotted path of
+  word-only segments (`` `my_dataset.my_table` ``) would false-positive
+  mangle.
+- **No `MEDIAN` aggregate; `PERCENTILE_CONT` is analytic-only.** Both
+  raise at SQL generation time (sqlglot doesn't transpile the base class's
+  `PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY x)` to BigQuery's analytic
+  form). Use a custom `Aggregation` with `APPROX_QUANTILES(x, 100)[OFFSET(N)]`
+  when you need it.
+- **No native EXPLAIN.** BigQuery has no SQL-level `EXPLAIN`. The
+  `BigqueryDialect.explain_prefix` is `None`, so `engine.execute(...,
+  explain=True)` returns the dry-run SQL unchanged rather than an execution
+  plan.
+
 ## Adding a new dialect
 
 1. Add the mapping to `slayer/engine/query_engine.py:_dialect_for_type()`.

diff --git a/docs/interfaces/pg-facade.md b/docs/interfaces/pg-facade.md
@@ -142,6 +142,122 @@ Postgres-specific predicates that aren't valid SLayer DSL (`ILIKE`, `::cast`, re
 `ANY`/`ALL`) parse but are rejected at execution — use the standard comparison / `IN` /
 `BETWEEN` forms.
 
+### `CAST(<column> AS <type>)` in projection
+
+A projection of the shape `CAST(<column> AS <type>)` (and the equivalent `col::type`
+sugar) is accepted when the inner expression is a bare or qualified column reference
+**and** the (source, target) pair is in the allowlist below. The engine still executes
+the bare column — the cast is a pure wire-layer type rewrite. The projected column's
+Postgres OID is overridden to match the casted type.
+
+Common BI shapes covered: `SELECT CAST(ordered_at AS TIMESTAMP) FROM orders` (DATE
+column promoted for a TIMESTAMP-aware client), `SELECT CAST(amount AS TEXT) AS s
+FROM orders` (stringification), `SELECT CAST(customers.region AS TEXT) FROM orders`
+(joined column).
+
+Out of scope: `CAST` around aggregates (`CAST(SUM(amount) AS DOUBLE)`), `TRY_CAST`,
+and `CAST` around expressions that aren't a bare column (`CAST(SUBSTRING(...) AS T)`).
+`CAST` wrapping a `DATE_TRUNC(...)` continues to route through the time-grain unwrap.
+
+`CAST(...)` in `ORDER BY` and `GROUP BY` has two layers of admission:
+
+1. **Unaliased canonical-form** (e.g. `ORDER BY CAST(c AS T)` repeating the
+   projection's CAST verbatim): **never admitted.** The translator raises
+   `ORDER BY column '...' is not in the projection list` / the GROUP BY
+   strict-on-extras error. Workaround: alias and reference the alias.
+2. **Aliased reference** (`SELECT CAST(c AS T) AS x ... ORDER BY x` /
+   `... GROUP BY x`): admitted **only** when the `(source, target)` pair
+   preserves sort/group semantics under the bare-column engine projection.
+
+Pairs that **fail** the aliased-reference admission and raise
+`ORDER BY on CAST projection '...' with lossy pair X→T is unsupported`
+(symmetric message for GROUP BY):
+
+| Path     | Lossy pairs                                                              |
+|----------|--------------------------------------------------------------------------|
+| ORDER BY | `X → TEXT` for every `X` (lex sort ≠ engine's natural sort)              |
+| GROUP BY | `TIMESTAMP → DATE` (many-to-one rollup); `INT → DOUBLE` (IEEE 754 collapse beyond ±2^53) |
+
+Every other admitted pair — identity (`X → X`), `DATE → TIMESTAMP`,
+`TIMESTAMP → DATE` for ORDER BY, `INT → DOUBLE` — preserves the casted
+semantics under the bare-column engine projection, so the alias path stays
+open.
+
+```sql
+-- Always rejected (canonical form):
+SELECT CAST(delivered_at AS TIMESTAMP) FROM orders
+ORDER BY CAST(delivered_at AS TIMESTAMP);
+
+-- Aliased reference, safe pair → works:
+SELECT CAST(delivered_at AS TIMESTAMP) AS dt FROM orders
+ORDER BY dt;
+
+-- Aliased reference, lossy pair → rejected:
+SELECT CAST(id AS TEXT) AS s FROM orders ORDER BY s;
+SELECT CAST(ordered_at AS DATE) AS d, COUNT(*) FROM orders GROUP BY d;
+```
+
+The wire-type override still applies in the safe-pair case — `dt` is
+wire-typed `TIMESTAMP` even though the engine sorts the underlying `DATE`.
+A future ticket can lift the remaining restrictions by pushing the CAST
+into the engine SQL.
+
+Admitted (source, target) coercions:
+
+| Source type   | Admitted target types        |
+|---------------|------------------------------|
+| `DATE`        | `DATE`, `TIMESTAMP`, `TEXT`  |
+| `TIMESTAMP`   | `TIMESTAMP`, `DATE`, `TEXT`  |
+| `INT`         | `INT`, `DOUBLE`, `TEXT`      |
+| `DOUBLE`      | `DOUBLE`, `TEXT`             |
+| `BOOLEAN`     | `BOOLEAN`, `TEXT`            |
+| `TEXT`        | `TEXT`                       |
+| *(unknown)*   | `TEXT`                       |
+
+Pairs outside the allowlist (e.g. `CAST(name AS INT)`, `CAST(amount AS BOOLEAN)`)
+raise `Unsupported CAST: cannot project <SOURCE> column as <TARGET> (...). Admitted
+coercions: see docs/interfaces/pg-facade.md.` Unsupported target types (`UUID`,
+`JSON`, `ARRAY`, `STRUCT`, …) raise the standard `Unsupported projection
+expression` error.
+
+#### CAST coarse-OID mapping
+
+CAST is a **coarse wire-OID hint**, not a precision-preserving conversion.
+The SLayer engine projects the bare column unchanged; the pg-facade encoder
+is OID-driven, so the wire bytes always match the OID we advertise. Some
+PostgreSQL types the user can write in a CAST don't have a one-to-one
+SLayer equivalent — those collapse onto the nearest broader SLayer type:
+
+| User wrote in `CAST(... AS X)` | SLayer maps to | Wire OID advertised |
+|---|---|---|
+| `INTEGER` / `INT` (pre-existing) | `DataType.INT` | 20 (`int8`) — not 23 (`int4`) |
+| `SMALLINT` | `DataType.INT` | 20 (`int8`) — not 21 (`int2`) |
+| `TINYINT` / `MEDIUMINT` (non-Postgres widths) | `DataType.INT` | 20 (`int8`) |
+| `BIGINT` | `DataType.INT` | 20 (`int8`) ✓ exact match |
+| `DECIMAL` / `NUMERIC` | `DataType.DOUBLE` | 701 (`float8`) — not 1700 (`numeric`) |
+| `FLOAT` / `REAL` / `DOUBLE` | `DataType.DOUBLE` | 701 (`float8`) ✓ |
+| `TIMESTAMPTZ` / `TIMESTAMP WITH TIME ZONE` | `DataType.TIMESTAMP` | 1114 (`timestamp`, no TZ) — not 1184 (`timestamptz`) |
+| `TIMESTAMP` / `DATETIME` | `DataType.TIMESTAMP` | 1114 (`timestamp`) ✓ |
+
+What this means in practice:
+
+- The wire bytes the client receives are always consistent with the OID we
+  advertise (the encoder picks the binary/text form from the OID). There is
+  no value corruption.
+- The OID is potentially broader than what the user typed. A client that
+  asked for `NUMERIC` and got `float8` sees a float on the wire and decodes
+  it correctly as a float — but loses the "exact precision" expectation.
+  A client that asked for `TIMESTAMPTZ` sees naive `timestamp` bytes — and
+  loses TZ-aware decoding semantics.
+- Callers needing exact `NUMERIC` precision, narrow integer wire widths, or
+  TZ-aware timestamps must compute upstream (or wait for SLayer to model
+  those types natively).
+
+`DOUBLE → INT` is intentionally excluded: Python's `int(<float>)` truncates toward zero
+while Postgres rounds half-to-even, so silently admitting the pair would diverge from
+`psql` semantics. Pre-aggregate or pre-round on your side when an integer-typed result
+is required.
+
 ## Introspection
 
 * `INFORMATION_SCHEMA.METRICS` / `DIMENSIONS` / `SCHEMATA` / `TABLES` / `COLUMNS`.

diff --git a/slayer/facade/catalog_sql.py b/slayer/facade/catalog_sql.py
@@ -1222,6 +1222,7 @@ def _substitute_context_functions(self, node: exp.Expression) -> exp.Expression:
         # Try each substitution branch in order; first hit wins.
         substituted = (
             self._substitute_qualified_context_call(node)
+            or self._substitute_qualified_context_column(node)
             or self._substitute_dedicated_func(node)
             or self._substitute_bareword_column(node)
             or self._substitute_anonymous_function(node)
@@ -1253,6 +1254,31 @@ def _substitute_qualified_context_call(
             or self._substitute_anonymous_function(rhs)
         )
 
+    def _substitute_qualified_context_column(
+        self, node: exp.Expression,
+    ) -> Optional[exp.Expression]:
+        """Replace ``pg_catalog.<bareword-ctx-fn>`` where sqlglot parses the
+        whole thing as ``Column(this=<ctx-fn>, table='pg_catalog')`` — the
+        no-parens shape (``pg_catalog.current_user``,
+        ``pg_catalog.current_catalog``). The Dot-shaped variant
+        (``pg_catalog.current_database()``) is handled by
+        ``_substitute_qualified_context_call``.
+        """
+        if not isinstance(node, exp.Column):
+            return None
+        table = node.args.get("table")
+        if table is None:
+            return None
+        table_name = (
+            str(table.this) if hasattr(table, "this") else str(table)
+        ).lower()
+        if table_name != "pg_catalog":
+            return None
+        ident = node.this
+        if not isinstance(ident, exp.Identifier):
+            return None
+        return self._literal_for_context_name(str(ident.this).lower())
+
     def _substitute_dedicated_func(self, node: exp.Expression) -> Optional[exp.Expression]:
         """Dedicated sqlglot Func subclasses (typed nodes for niladic ctx fns)."""
         if isinstance(node, (exp.CurrentDatabase, getattr(exp, "CurrentCatalog", exp.CurrentDatabase))):