Skip to content

Commit d8a0d84

Browse files
Address PR feedback on JS Asset Auditor spec
Fix incorrect MCP tool name prefix, replace misused wait_for with evaluate_script setTimeout, correct list_network_requests filtering to use resourceTypes, resolve path derivation contradiction with consistent /js-assets/{prefix}/{stem}.js formula, pin slug separator and base62 charset, add URL Processing section with normalization rules and first-party boundary definition, tighten wildcard regex to require mixed character classes, and move skill location to .claude/commands/.
1 parent 89fab0b commit d8a0d84

1 file changed

Lines changed: 74 additions & 39 deletions

File tree

docs/superpowers/specs/2026-04-01-js-asset-auditor-design.md

Lines changed: 74 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
**Date:** 2026-04-01
44
**Status:** Approved for engineering breakdown
5-
**Related:** [JS Asset Proxy spec](2026-04-01-js-asset-proxy-design.md)
5+
**Related:** [JS Asset Proxy spec](2026-04-01-js-asset-proxy-design.md) _(on `js-asset-proxy-spec` branch until merged)_
66

77
---
88

@@ -21,38 +21,63 @@ It also runs as a monitoring tool — `--diff` mode compares a new sweep against
2121
## Command Interface
2222

2323
```bash
24-
/audit-js-assets https://www.publisher.com # init — generate js-assets.toml
25-
/audit-js-assets https://www.publisher.com --diff # diff — compare against existing file
24+
/audit-js-assets https://www.publisher.com # init — generate js-assets.toml
25+
/audit-js-assets https://www.publisher.com --diff # diff — compare against existing file
26+
/audit-js-assets https://www.publisher.com --settle 15000 # longer settle for ad-tech-heavy pages
2627
```
2728

2829
---
2930

3031
## Sweep Protocol
3132

3233
1. Read `trusted-server.toml` → extract `publisher.domain` (defines first-party boundary)
33-
2. Open Chrome via `mcp__chrome-devtools__new_page`, navigate to target URL via `mcp__chrome-devtools__navigate_page`
34-
3. Wait for full page load + ~6s settle window for async script loads (`mcp__chrome-devtools__wait_for`)
34+
2. Open Chrome via `mcp__plugin_chrome-devtools-mcp_chrome-devtools__new_page`, navigate to target URL via `mcp__plugin_chrome-devtools-mcp_chrome-devtools__navigate_page`
35+
3. Wait for page load settle: `mcp__plugin_chrome-devtools-mcp_chrome-devtools__evaluate_script` with `await new Promise(r => setTimeout(r, SETTLE_MS))` where `SETTLE_MS` defaults to 6000 (configurable via `--settle <ms>`)
3536
4. In parallel:
36-
- `mcp__chrome-devtools__list_network_requests` → filter for requests where URL ends in `.js` or `Content-Type: application/javascript`, and origin ≠ `publisher.domain`
37-
- `mcp__chrome-devtools__evaluate_script``Array.from(document.head.querySelectorAll('script[src]')).map(s => s.src)` → collect head-loaded script URLs
38-
5. Apply heuristic filter (see below)
37+
- `mcp__plugin_chrome-devtools-mcp_chrome-devtools__list_network_requests` with `resourceTypes: ["script"]` → post-filter to exclude first-party hosts (see URL Processing below)
38+
- `mcp__plugin_chrome-devtools-mcp_chrome-devtools__evaluate_script``Array.from(document.head.querySelectorAll('script[src]')).map(s => s.src)` → collect head-loaded script URLs
39+
5. Apply URL normalization (see below), then heuristic filter (see below)
3940
6. For each surviving asset, generate a `[[js_assets]]` entry (see below)
4041
7. Write output (init or diff mode)
4142
8. Print terminal summary
42-
9. Close page via `mcp__chrome-devtools__close_page`
43+
9. Close page via `mcp__plugin_chrome-devtools-mcp_chrome-devtools__close_page`
44+
45+
**`inject_in_head` semantics:** The DOM snapshot in step 4 captures the final state of `<head>` after the settle window. Scripts that were briefly inserted and then removed by a loader will not appear. This is intentional — `inject_in_head = true` means "the script is present in `<head>` at page-stable state." If a loader removes it before the snapshot, the proxy should not re-inject it.
46+
47+
---
48+
49+
## URL Processing
50+
51+
### First-party boundary
52+
53+
A network request is **first-party** if the request URL's host, after stripping a leading `www.`, matches `publisher.domain` (from `trusted-server.toml`) after the same stripping. Matching is exact on the resulting strings.
54+
55+
Publisher-owned CDN subdomains (e.g., `cdn.publisher.com`, `static.publisher.com`) are treated as third-party by default. If the publisher wants to exclude them, they can be added to a `first_party_hosts` list in the command invocation (e.g., `--first-party cdn.publisher.com`).
56+
57+
### URL normalization
58+
59+
Applied to every captured script URL before slug generation and before persisting `origin_url`:
60+
61+
1. Strip fragment (`#...`)
62+
2. Strip all query parameters — cache-busters (`?v=123`, `?cb=timestamp`), consent params, and session tokens all live in query strings. JS asset versioning uses path segments, not query params.
63+
3. Strip trailing slash from the path
64+
65+
The normalized URL is what gets stored in `origin_url` and fed into the slug hash.
4366

4467
---
4568

4669
## Heuristic Filter
4770

4871
The following origin categories are excluded silently. The terminal summary reports what was filtered and why so operators can manually add entries if needed.
4972

50-
| Category | Excluded origins |
51-
|---|---|
73+
**Matching:** Filter entries match if the request URL's host ends with the filter entry, with a dot-boundary check. For example, `googletagmanager.com` in the filter matches `www.googletagmanager.com` but not `evil-googletagmanager.com`.
74+
75+
| Category | Excluded origins |
76+
| -------------- | ------------------------------------------------------------------------------ |
5277
| Framework CDNs | `cdnjs.cloudflare.com`, `ajax.googleapis.com`, `cdn.jsdelivr.net`, `unpkg.com` |
53-
| Error tracking | `sentry.io`, `bugsnag.com`, `rollbar.com` |
54-
| Font services | `fonts.googleapis.com`, `fonts.gstatic.com` |
55-
| Social embeds | `platform.twitter.com`, `connect.facebook.net` |
78+
| Error tracking | `sentry.io`, `bugsnag.com`, `rollbar.com` |
79+
| Font services | `fonts.googleapis.com`, `fonts.gstatic.com` |
80+
| Social embeds | `platform.twitter.com`, `platform.x.com`, `connect.facebook.net` |
5681

5782
**`googletagmanager.com` is not filtered** — GTM is ad tech and should be proxied.
5883

@@ -62,31 +87,38 @@ Everything else surfaces for operator review.
6287

6388
## Asset Entry Generation
6489

65-
| Field | Derivation |
66-
|---|---|
67-
| `slug` | `{publisher_prefix}:{asset_stem}` — see slug algorithm below |
68-
| `path` | `/{publisher_prefix}/{asset_stem}.js`, or wildcard variant if versioned path detected |
69-
| `origin_url` | Full captured URL, with wildcard substitution applied if versioned |
70-
| `ttl_sec` | Omitted — proxy defaults to 1800 (wildcard) or 3600 (fixed) |
71-
| `inject_in_head` | `true` if URL appeared in head script list from DOM evaluation, else `false` |
90+
| Field | Derivation |
91+
| ---------------- | --------------------------------------------------------------------------------------------------- |
92+
| `slug` | `{publisher_prefix}:{asset_stem}` — see slug algorithm below |
93+
| `path` | Fixed: `/js-assets/{publisher_prefix}/{asset_stem}.js`. Wildcard: `/js-assets/{publisher_prefix}/*` |
94+
| `origin_url` | Normalized URL (see URL Processing), with wildcard substitution applied if versioned |
95+
| `ttl_sec` | Omitted — proxy defaults to 1800 (wildcard) or 3600 (fixed) |
96+
| `stale_ttl_sec` | Omitted — proxy defaults to 86400 (24h) |
97+
| `inject_in_head` | `true` if URL appeared in head script list from DOM evaluation, else `false` |
7298

7399
### Slug algorithm
74100

75101
```
76-
publisher_prefix = first_8_chars(base62(sha256(publisher.domain + origin_url)))
102+
publisher_prefix = first_8_chars(base62(sha256(publisher.domain + "|" + origin_url)))
77103
asset_stem = filename_without_extension(origin_url)
78104
slug = "{publisher_prefix}:{asset_stem}"
79105
```
80106

107+
The pipe (`|`) separator is required — it cannot appear in domain names or at the start of a URL, so the hash input is unambiguous. The `origin_url` fed into the hash must be the normalized URL (see URL Processing).
108+
109+
**base62 charset:** `0-9A-Za-z` (digits first, then uppercase, then lowercase). This matches the `base62` crate convention.
110+
81111
**Rationale:** Fully opaque and hash-derived — no human naming required, no ambiguity for cryptic vendor filenames. The KV metadata (`origin_url`, `content_type`, `asset_slug`) serves as the lookup table. Operators can query `js-asset:{slug}` in the KV store to retrieve full provenance. The terminal summary also prints slug → origin_url at generation time.
82112

83113
**Important:** This algorithm must produce identical output to the Proxy's KV key derivation. Engineering should implement this as a shared utility (e.g., a small JS/TS helper in the skill, or a standalone `scripts/` utility) rather than duplicating the logic.
84114

85115
### Wildcard detection
86116

87-
Path segments matching either pattern are replaced with `*`:
117+
Path segments matching any of these patterns are replaced with `*`:
118+
88119
- Semver: `\d+\.\d+[\.\d-]*` (e.g., `1.19.8-hcskhn`)
89-
- Hash-like: `[a-f0-9]{6,}` or `[A-Za-z0-9]{8,}` between path separators
120+
- Hex hash: `[a-f0-9]{8,}` between path separators (lowercase hex, minimum 8 characters)
121+
- Mixed alphanumeric hash: `[A-Za-z0-9]{8,}` between path separators, **must contain at least one digit and at least one letter** — this excludes pure-alpha dictionary words like `analytics` or `bootstrap`
90122

91123
The original URL is preserved as a comment above the generated entry so operators can verify the wildcard substitution is correct.
92124

@@ -104,14 +136,14 @@ The original URL is preserved as a comment above the generated entry so operator
104136
[[js_assets]]
105137
# https://web.prebidwrapper.com/golf-WnLmpLyEjL/default-v2/prebid-load.js
106138
slug = "aB3kR7mN:prebid-load"
107-
path = "/sdk/aB3kR7mN.js"
139+
path = "/js-assets/aB3kR7mN/prebid-load.js"
108140
origin_url = "https://web.prebidwrapper.com/golf-WnLmpLyEjL/default-v2/prebid-load.js"
109141
inject_in_head = true
110142

111143
[[js_assets]]
112144
# https://raven-static.vendor.io/prod/1.19.8-hcskhn/raven.js (wildcard detected)
113145
slug = "xQ9pL2wY:raven"
114-
path = "/raven-static/*"
146+
path = "/js-assets/xQ9pL2wY/*"
115147
origin_url = "https://raven-static.vendor.io/prod/*/raven.js"
116148
inject_in_head = false
117149
```
@@ -140,11 +172,11 @@ Diff mode: /audit-js-assets <url> --diff
140172

141173
Compares sweep results against the existing `js-assets.toml`.
142174

143-
| Condition | Behavior |
144-
|---|---|
145-
| Asset in sweep, not in file | **New** — appended to `js-assets.toml` as a commented-out block |
175+
| Condition | Behavior |
176+
| --------------------------- | ----------------------------------------------------------------------- |
177+
| Asset in sweep, not in file | **New** — appended to `js-assets.toml` as a commented-out block |
146178
| Asset in file, not in sweep | **Missing** — flagged in terminal summary with ``. Never auto-removed. |
147-
| Asset in both | **Confirmed** — listed as present |
179+
| Asset in both | **Confirmed** — listed as present |
148180

149181
New entries are appended as TOML comments so the file stays valid and nothing is activated without the operator explicitly uncommenting.
150182

@@ -155,7 +187,7 @@ New entries are appended as TOML comments so the file stays valid and nothing is
155187
# [[js_assets]]
156188
# # https://googletagmanager.com/gtm.js
157189
# slug = "zM4nK8vP:gtm"
158-
# path = "/sdk/zM4nK8vP.js"
190+
# path = "/js-assets/zM4nK8vP/gtm.js"
159191
# origin_url = "https://googletagmanager.com/gtm.js"
160192
# inject_in_head = true
161193
```
@@ -179,17 +211,20 @@ Missing: 1 asset no longer seen on page ⚠
179211

180212
The Auditor is a Claude Code skill file. No compiled code.
181213

182-
**Skill location:** `.claude/skills/audit-js-assets.md`
214+
**Skill location:** `.claude/commands/audit-js-assets.md`
183215

184216
**MCP tools used:**
185-
- `mcp__chrome-devtools__new_page` — open browser tab
186-
- `mcp__chrome-devtools__navigate_page` — load publisher URL
187-
- `mcp__chrome-devtools__wait_for` — settle after page load
188-
- `mcp__chrome-devtools__list_network_requests` — capture JS requests
189-
- `mcp__chrome-devtools__evaluate_script` — detect head-loaded scripts via DOM query
190-
- `mcp__chrome-devtools__close_page` — clean up tab
217+
218+
- `mcp__plugin_chrome-devtools-mcp_chrome-devtools__new_page` — open browser tab
219+
- `mcp__plugin_chrome-devtools-mcp_chrome-devtools__navigate_page` — load publisher URL
220+
- `mcp__plugin_chrome-devtools-mcp_chrome-devtools__list_network_requests` — capture JS requests
221+
- `mcp__plugin_chrome-devtools-mcp_chrome-devtools__evaluate_script` — settle window + detect head-loaded scripts via DOM query
222+
- `mcp__plugin_chrome-devtools-mcp_chrome-devtools__close_page` — clean up tab
223+
224+
**Permission grants required:** `navigate_page`, `list_network_requests`, and `close_page` are not currently approved in `.claude/settings.json`. Add them to `permissions.allow` before running the skill, or expect interactive permission prompts on first run.
191225

192226
**File tools used:**
227+
193228
- `Read` — read `trusted-server.toml` (publisher domain) and existing `js-assets.toml` (diff mode)
194229
- `Write` — write generated/updated `js-assets.toml`
195230

@@ -199,7 +234,7 @@ The Auditor is a Claude Code skill file. No compiled code.
199234

200235
The Auditor should be delivered **after Proxy Phase 1** (so `js-assets.toml` schema is defined) and **before Proxy Phase 2** (so engineering has real populated entries to test the cache pipeline against actual vendor origins).
201236

202-
See [delivery order in the Proxy spec](2026-04-01-js-asset-proxy-design.md).
237+
See [delivery order in the Proxy spec](2026-04-01-js-asset-proxy-design.md) _(on `js-asset-proxy-spec` branch until merged)_.
203238

204239
---
205240

0 commit comments

Comments
 (0)