Skip to content

plans: §4 sub-PR 2.2 spike memo (discogs-cache match score shape)#794

Open
jakebromberg wants to merge 1 commit into
mainfrom
plans/discogs-cache-match-score-shape-spike
Open

plans: §4 sub-PR 2.2 spike memo (discogs-cache match score shape)#794
jakebromberg wants to merge 1 commit into
mainfrom
plans/discogs-cache-match-score-shape-spike

Conversation

@jakebromberg
Copy link
Copy Markdown
Member

@jakebromberg jakebromberg commented May 9, 2026

Summary

Spike memo at plans/library-hook-canonicalization/audits/discogs-cache-match-score-shape.md auditing the two questions §4 sub-PR 2.2 spelled out as preconditions for implementation.

The original second commit on this branch (a plan amendment splitting §4 sub-PR 2.2 into 2.2a + 2.2b) has been dropped because the architecture pivot in #800 supersedes it — Backend will no longer implement source-leg backfills directly. LML owns identity resolution end-to-end; Backend caches the verdict via a single bulk-resolve endpoint.

The spike findings stand on their own as reference for LML's own resolution logic, so this PR is narrowed to the memo only.

Spike findings (still valid as reference)

  • Neither flowsheet_match nor fuzzy_resolved has a trgm_score column. The original plan's 0.7 + 0.3 * trgm_score trigram fallback was dead code. flowsheet_match is an exact equi-join (no fuzzy score by construction); fuzzy_resolved discards the score during its resolve step (only array_agg(library_id ORDER BY combined DESC) survives).
  • fuzzy_resolved carries no Discogs ID — only resolved_library_id. The master-vs-release decision applies only to flowsheet_match, where the upstream pins one or the other.

These observations belong to LML's resolution domain post-pivot but are documented here so the reasoning isn't lost.

Test plan

  • npm run format:check — clean.
  • All cited file/line references in the spike memo verified against Backend-Service/scripts/discogs-bridge-flowsheet.sql and fuzzy-trigram-flowsheet.sql.

Refs #663, #800.

Audits the two questions §4 sub-PR 2.2 spelled out as preconditions for
implementation. Both findings invalidate the plan's central assumption:

1. Neither flowsheet_match nor fuzzy_resolved has a trgm_score column.
   - flowsheet_match is an exact equi-join on normalized strings; no fuzzy
     score because no fuzzy match. The plan's `0.7 + 0.3 * trgm_score`
     trigram fallback is dead code; alias_match 0.75 is the only viable
     mapping for distinct_entities > 1.
   - fuzzy_resolved discards the trigram score during its resolve step;
     the score lives only on the un-persisted `fuzzy_full` staging table
     as `combined = similarity(artist) + similarity(album) ∈ [1.55, 2.0]`.

2. fuzzy_resolved carries no Discogs ID at all — only resolved_library_id.
   The master-vs-release decision applies only to flowsheet_match (S3),
   where the upstream prefers master and falls back to release; recommended
   mapping is option (c): write whichever the source row pins, never both.

Recommendation: split sub-PR 2.2 into 2.2a (flowsheet_match) + 2.2b
(fuzzy_resolved). The two mapping logics diverge enough — confidence
formula, Discogs ID handling, writer contract — that bisecting cleanly
warrants the second PR.
@jakebromberg jakebromberg changed the title plans: §4 sub-PR 2.2 spike memo (discogs-cache match score shape) plans: §4 sub-PR 2.2 spike memo + split into 2.2a + 2.2b May 9, 2026
@jakebromberg jakebromberg force-pushed the plans/discogs-cache-match-score-shape-spike branch from 0840a35 to f7d4086 Compare May 10, 2026 04:06
@jakebromberg jakebromberg changed the title plans: §4 sub-PR 2.2 spike memo + split into 2.2a + 2.2b plans: §4 sub-PR 2.2 spike memo (discogs-cache match score shape) May 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant