fix(workflow): use main's metadata/repos.yaml as authoritative private-repo list for privacy gate#3395
Conversation
Co-authored-by: marcusrbrown <831617+marcusrbrown@users.noreply.github.com>
Co-authored-by: marcusrbrown <831617+marcusrbrown@users.noreply.github.com>
fro-bot
left a comment
There was a problem hiding this comment.
The fix is correct and the reasoning holds up under scrutiny. The privacy gate reads metadata/repos.yaml to decide what counts as private, then scans the data branch's knowledge/wiki/repos/ for filenames matching those entries. The failure mode you diagnosed is real: resolveCanonicalSlugs is fail-closed and throws on node-null, so the orphan entries (R_kgDOSVJgdw, R_kgDOSZ9x-w) pointing at deleted repos would block the merge in perpetuity. Pinning the reference list to main—where repos.yaml is authoritatively maintained by the survey workflow—is the right move.
What matters for security: this does not loosen the gate. The subject of the leak scan (the wiki files being promoted) still comes from the data branch. Only the definition of private gets pinned to main. Metadata flows main→data, so main is the canonical source by construction. The inverse risk—a repo marked private on data but not yet on main—runs against the actual data flow and isn't a realistic leak path here.
Verdict: PASS
Blocking issues
None.
Non-blocking concerns
.github/workflows/merge-data.yaml:46— thecpsilently clobbers the data branch'srepos.yaml. That's the intent, and if../metadata/repos.yamlis ever missing thecpfails non-zero and the step errors out before the node check runs—so it stays fail-closed. Fine as-is, but a one-line# overwrite is intentionalnext to the existing comment would spare the next reader the double-take. Optional.- No new dependencies, no untrusted input reaching the
run:block, permissions still scoped tocontents: readat the job level with the App token injected per-step. Clean.
Missing tests
None required. scripts/check-wiki-private-presence.test.ts already exercises the script logic (leak detection, fail-closed resolution modes, ENOENT handling). The cp is a one-line workflow glue step; this repo doesn't unit-test workflow YAML, and adding a harness for a single copy would be more chrome than the change warrants.
Risk assessment
LOW. Single-file, +5/-1 workflow change. It restores a stuck automation to working order without expanding the trust surface—the leak scan's target is unchanged, only its reference list is pinned to the authoritative branch. Worst case on a bad cp is a hard failure, not a silent privacy bypass.
Run Summary
| Field | Value |
|---|---|
| Event | pull_request |
| Repository | fro-bot/.github |
| Run ID | 26730219662 |
| Cache | hit |
| Session | ses_17f3b20f0ffeAfSnx8Pyw0qp0F |
After PR #3394 removed the orphan entries (
R_kgDOSVJgdw,R_kgDOSZ9x-w) frommain'smetadata/repos.yaml, the Merge Data Branch workflow kept failing because the privacy check runs underworking-directory: data-branch-check— thedatabranch checkout — so it readsdata-branch-check/metadata/repos.yaml, notmain's copy.Change
Before invoking
check-wiki-private-presence.ts, copymain'smetadata/repos.yamlover the data branch copy so the check always uses the canonical private-repo list:The data branch's wiki files (
knowledge/wiki/repos/) are still the subject of the leak scan — only the reference list of what counts as private is pinned tomain.