fix(postgres): correctly infer nullability for LEFT JOIN rewritten as RIGHT JOIN#4285
fix(postgres): correctly infer nullability for LEFT JOIN rewritten as RIGHT JOIN#4285luca992 wants to merge 1 commit into
Conversation
… RIGHT JOIN PostgreSQL's planner may execute `A LEFT JOIN B` as a hash join with `Join Type: Right` to put the smaller relation on the hash-build side. This is documented behavior of the planner: * [Postgres docs, 14.3 Controlling the Planner with Explicit JOIN Clauses] > "Most practical cases involving LEFT JOIN or RIGHT JOIN can be > rearranged to some extent." https://www.postgresql.org/docs/current/explicit-joins.html * [Postgres Pro, "Queries in PostgreSQL: 6. Hashing"] > "On the physical level, the planner determines which set is the > inner one and which is the outer one not by their positions in > the query, but by the relative join cost. ... So, the join type > switches from left to right in the plan." https://postgrespro.com/blog/pgsql/5969673 After the swap, the SQL right operand (the nullable side under LEFT JOIN semantics) appears as the plan's `Outer` child rather than the `Inner` child. The previous `visit_plan` only marked `Inner` children as nullable, which on a `Join Type: Right` plan: * incorrectly marked the SQL left operand (always preserved) as nullable — causing spurious `Option<T>` in macro output for NOT NULL columns; and * failed to mark the SQL right operand as nullable — masking real NULLs and panicking at decode time when no LEFT JOIN row matched. Thread the parent join type into `visit_plan` and decide which child is the NULL-fill side based on it: * `Left` → `Inner` child is nullable (no change) * `Right` → `Outer` child is nullable (new) * `Full` → both children are nullable (no change) Also recurse into all child plans (not only when the current node is `Left`/`Right`), so nested joins reached through non-join intermediates like `Hash` are walked. Closes transact-rs#3202.
977cbbb to
691420b
Compare
|
I observed an issue with In my case, the query plan on the unpopulated DB contains a root output of the shape The generated query plan for the populated DB is different and doesn't contain such an expression as one of its root outputs. |
|
@die-herdplatte hmm is there a way you can share for me to reproduce it? |
PostgreSQL's planner may execute
A LEFT JOIN Bas a hash join withJoin Type: Rightto put the smaller relation on the hash-build side.This is documented behavior:
[Postgres docs, 14.3 Controlling the Planner with Explicit JOIN Clauses]
[Postgres Pro, "Queries in PostgreSQL: 6. Hashing"]
After the swap the SQL right operand (the nullable side under LEFT JOIN semantics) ends up as the plan's
Outerchild instead of theInnerchild. The oldvisit_planonly markedInnerchildren nullable, which on aJoin Type: Rightplan:Option<T>in macro output for NOT NULL columnsFix threads the parent join type into
visit_planand picks the NULL-fill side based on it:Left->Innerchild is nullable (no change)Right->Outerchild is nullable (new)Full-> both children are nullable (no change)It also recurses into all child plans, not only when the current node is
Left/Right, so nested joins reached through non-join intermediates likeHashare walked.Does your PR solve an issue?
fixes #3202
Is this a breaking change?
Behavior change yes. The old behavior was wrong inference, not a documented contract.
For queries with a LEFT JOIN that postgres rewrites as a Hash Right Join (driven by
pg_statisticcost estimates, fires on production sized data withplan_cache_mode = force_generic_planwhichsqlx-macros-coreitself sets), generated types change:Columns from the SQL left operand that are NOT NULL in their base table go from
Option<T>toT. So code handling these unneeded nulls would need to be dropped.Columns from the SQL right operand (the LEFT JOIN nullable side) go from
TtoOption<T>. Code treating these as non-null was already at risk ofunexpected null; try decoding as an Optionpanics at runtime whenever a row had no matching join partner. After this it stops compiling and the field needs to becomeOption<T>.The second case is the bigger one. Code passed type checks before only because LEFT JOINs happened to always find a match in test data. After this fix the type system exposes the real nullability.