IABTechLab · jevansnyc · Apr 1, 2026 · aram356 · Apr 6, 2026 · prk-Jr
diff --git a/docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md b/docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md
@@ -0,0 +1,216 @@
+# JS Asset Auditor — Engineering Spec
+
+**Date:** 2026-04-01  
+**Status:** Approved for engineering breakdown  
+**Related:** [JS Asset Proxy spec](2026-04-01-js-asset-proxy-design.md)
+
+---
+
+## Context
+
+The JS Asset Proxy requires a `js-assets.toml` file declaring which third-party JS assets to proxy. Without tooling, populating this file requires manually inspecting network requests in browser DevTools, extracting URLs, generating opaque slugs, and writing TOML — a tedious error-prone process that is a barrier to publisher onboarding.
+
+The Auditor eliminates this friction. It sweeps a publisher's page using the Chrome DevTools MCP, detects third-party JS assets, auto-generates `js-assets.toml` entries, and auto-detects `inject_in_head` from the page DOM. The operator's only remaining decision is reviewing the output before committing.
+
+It also runs as a monitoring tool — `--diff` mode compares a new sweep against the existing config and surfaces new or removed assets, giving publishers ongoing visibility into their third-party JS footprint.
+
+**Implementation:** Pure Claude Code skill — no Rust, no compiled code, no additional dependencies. Uses the Chrome DevTools MCP already configured in `.claude/settings.json`.
+
+---
+
+## Command Interface
+
+```bash
+/audit-js-assets https://www.publisher.com          # init — generate js-assets.toml
+/audit-js-assets https://www.publisher.com --diff   # diff — compare against existing file
+```
+
+---
+
+## Sweep Protocol
+
+1. Read `trusted-server.toml` → extract `publisher.domain` (defines first-party boundary)
+2. Open Chrome via `mcp__chrome-devtools__new_page`, navigate to target URL via `mcp__chrome-devtools__navigate_page`
+3. Wait for full page load + ~6s settle window for async script loads (`mcp__chrome-devtools__wait_for`)
+4. In parallel:
+   - `mcp__chrome-devtools__list_network_requests` → filter for requests where URL ends in `.js` or `Content-Type: application/javascript`, and origin ≠ `publisher.domain`
+   - `mcp__chrome-devtools__evaluate_script` → `Array.from(document.head.querySelectorAll('script[src]')).map(s => s.src)` → collect head-loaded script URLs
+5. Apply heuristic filter (see below)
+6. For each surviving asset, generate a `[[js_assets]]` entry (see below)
+7. Write output (init or diff mode)
+8. Print terminal summary
+9. Close page via `mcp__chrome-devtools__close_page`
+
+---
+
+## Heuristic Filter
+
+The following origin categories are excluded silently. The terminal summary reports what was filtered and why so operators can manually add entries if needed.
+
+| Category | Excluded origins |
+|---|---|
+| Framework CDNs | `cdnjs.cloudflare.com`, `ajax.googleapis.com`, `cdn.jsdelivr.net`, `unpkg.com` |
+| Error tracking | `sentry.io`, `bugsnag.com`, `rollbar.com` |
+| Font services | `fonts.googleapis.com`, `fonts.gstatic.com` |
+| Social embeds | `platform.twitter.com`, `connect.facebook.net` |
+
+**`googletagmanager.com` is not filtered** — GTM is ad tech and should be proxied.
+
+Everything else surfaces for operator review.
+
+---
+
+## Asset Entry Generation
+
+| Field | Derivation |
+|---|---|
+| `slug` | `{publisher_prefix}:{asset_stem}` — see slug algorithm below |
+| `path` | `/{publisher_prefix}/{asset_stem}.js`, or wildcard variant if versioned path detected |
+| `origin_url` | Full captured URL, with wildcard substitution applied if versioned |
+| `ttl_sec` | Omitted — proxy defaults to 1800 (wildcard) or 3600 (fixed) |
+| `inject_in_head` | `true` if URL appeared in head script list from DOM evaluation, else `false` |
+
+### Slug algorithm
+
+```
+publisher_prefix = first_8_chars(base62(sha256(publisher.domain + origin_url)))
+asset_stem       = filename_without_extension(origin_url)
+slug             = "{publisher_prefix}:{asset_stem}"
+```
+
+**Rationale:** Fully opaque and hash-derived — no human naming required, no ambiguity for cryptic vendor filenames. The KV metadata (`origin_url`, `content_type`, `asset_slug`) serves as the lookup table. Operators can query `js-asset:{slug}` in the KV store to retrieve full provenance. The terminal summary also prints slug → origin_url at generation time.
+
+**Important:** This algorithm must produce identical output to the Proxy's KV key derivation. Engineering should implement this as a shared utility (e.g., a small JS/TS helper in the skill, or a standalone `scripts/` utility) rather than duplicating the logic.
+
+### Wildcard detection
+
+Path segments matching either pattern are replaced with `*`:
+- Semver: `\d+\.\d+[\.\d-]*` (e.g., `1.19.8-hcskhn`)
+- Hash-like: `[a-f0-9]{6,}` or `[A-Za-z0-9]{8,}` between path separators
+
+The original URL is preserved as a comment above the generated entry so operators can verify the wildcard substitution is correct.
+
+---
+
+## Init Mode Output
+
+### `js-assets.toml` (written to repo root)
+
+```toml
+# Generated by /audit-js-assets on 2026-04-01
+# Publisher: publisher.com
+# Source URL: https://www.publisher.com
+
+[[js_assets]]
+# https://web.prebidwrapper.com/golf-WnLmpLyEjL/default-v2/prebid-load.js
+slug = "aB3kR7mN:prebid-load"
+path = "/sdk/aB3kR7mN.js"
+origin_url = "https://web.prebidwrapper.com/golf-WnLmpLyEjL/default-v2/prebid-load.js"
+inject_in_head = true
+
+[[js_assets]]
+# https://raven-static.vendor.io/prod/1.19.8-hcskhn/raven.js (wildcard detected)
+slug = "xQ9pL2wY:raven"
+path = "/raven-static/*"
+origin_url = "https://raven-static.vendor.io/prod/*/raven.js"
+inject_in_head = false
+```
+
+### Terminal summary
+
+```
+JS Asset Audit — publisher.com
+────────────────────────────────
+Detected:  8 third-party JS requests
+Filtered:  3 (cdnjs.cloudflare.com ×2, sentry.io ×1)
+Surfaced:  5 assets → js-assets.toml
+
+  aB3kR7mN  inject_in_head=true   web.prebidwrapper.com/.../prebid-load.js
+  xQ9pL2wY  inject_in_head=false  raven-static.vendor.io/prod/*/raven.js  [wildcard]
+  zM4nK8vP  inject_in_head=true   googletagmanager.com/gtm.js
+  ...
+
+Review inject_in_head values and commit js-assets.toml when ready.
+Diff mode: /audit-js-assets <url> --diff
+```
+
+---
+
+## Diff Mode Output
+
+Compares sweep results against the existing `js-assets.toml`.
+
+| Condition | Behavior |
+|---|---|
+| Asset in sweep, not in file | **New** — appended to `js-assets.toml` as a commented-out block |
+| Asset in file, not in sweep | **Missing** — flagged in terminal summary with `⚠`. Never auto-removed. |
+| Asset in both | **Confirmed** — listed as present |
+
+New entries are appended as TOML comments so the file stays valid and nothing is activated without the operator explicitly uncommenting.
+
+### `js-assets.toml` (new entry appended as comment)
+
+```toml
+# --- NEW (detected by /audit-js-assets --diff on 2026-04-01, uncomment to activate) ---
+# [[js_assets]]
+# # https://googletagmanager.com/gtm.js
+# slug = "zM4nK8vP:gtm"
+# path = "/sdk/zM4nK8vP.js"
+# origin_url = "https://googletagmanager.com/gtm.js"
+# inject_in_head = true
+```
+
+### Terminal summary (diff mode)
+
+```
+JS Asset Audit (diff) — publisher.com
+────────────────────────────────
+Confirmed:  4 assets still present on page
+New:        1 asset detected (appended as comment to js-assets.toml)
+Missing:    1 asset no longer seen on page ⚠
+
+  NEW      zM4nK8vP  googletagmanager.com/gtm.js  → review in js-assets.toml
+  MISSING  xQ9pL2wY  raven-static.vendor.io/...   → may have been removed or renamed
+```
+
+---
+
+## Implementation
+
+The Auditor is a Claude Code skill file. No compiled code.
+
+**Skill location:** `.claude/skills/audit-js-assets.md`
+
+**MCP tools used:**
+- `mcp__chrome-devtools__new_page` — open browser tab
+- `mcp__chrome-devtools__navigate_page` — load publisher URL
+- `mcp__chrome-devtools__wait_for` — settle after page load
+- `mcp__chrome-devtools__list_network_requests` — capture JS requests
+- `mcp__chrome-devtools__evaluate_script` — detect head-loaded scripts via DOM query
+- `mcp__chrome-devtools__close_page` — clean up tab
+
+**File tools used:**
+- `Read` — read `trusted-server.toml` (publisher domain) and existing `js-assets.toml` (diff mode)
+- `Write` — write generated/updated `js-assets.toml`
+
+---
+
+## Delivery Order
+
+The Auditor should be delivered **after Proxy Phase 1** (so `js-assets.toml` schema is defined) and **before Proxy Phase 2** (so engineering has real populated entries to test the cache pipeline against actual vendor origins).
+
+See [delivery order in the Proxy spec](2026-04-01-js-asset-proxy-design.md).
+
+---
+
+## Verification
+
+- Run `/audit-js-assets https://www.publisher.com` against a known test publisher page with identified third-party JS
+- Verify generated entries match actual third-party JS observed on the page (cross-check in browser DevTools)
+- Verify `inject_in_head = true` only for scripts that appear in `<head>` (not `<body>`)
+- Verify wildcard detection fires for versioned path segments and not for stable paths
+- Verify GTM (`googletagmanager.com`) is captured and not filtered
+- Verify framework CDNs (`cdnjs.cloudflare.com` etc.) are filtered with reason in summary
+- Run `--diff` against an unchanged page → all entries confirmed, no new/missing
+- Run `--diff` after adding a new vendor script to the page → appears as `NEW` in summary
+- Run `--diff` after removing a script → appears as `MISSING ⚠` in summary, file unchanged