perf(pm): demand resolver — driver loop + graph helpers (2/3) by elrrrrrrr · Pull Request #3086 · utooland/utoo

elrrrrrrr · 2026-05-28T04:16:37Z

Part 2/3 of the #3084 split — the demand-driven BFS resolver loop. Stacks on PR-A (#3085, the ManifestProvider trait + the registry-backed adapter). The cutover that points the public build_deps_* and resolve_* entry-points + api::build_lockfile at the demand pipeline, and lands the two runtime tunings the demand driver's single-flight makes pay off, is PR-C — which I'll open after both #3085 and this PR's bench gates are in.

The stack after the amendment

The original PR-A had bundled the HTTP-pool and the resolver-side concurrency knob alongside the trait. Bench on that earlier shape came in at a 14% wall-clock regression on p1_resolve vs utoo-next (3.27s vs 2.86s on the standard ant-design workspace) because those tunings only pay off on a single-flight pipeline (which the legacy resolver isn't) and overcommit the npmjs per-IP rate-limit when applied to the per-edge legacy fetcher. Per the lesson from the earlier multi-PR rebuilds (memory note: "split PRs by 'does it ship the perf' not just 'additive vs delete'"), the tunings moved to the cutover PR where they're paired with the entry-point flip that makes them sound. After that amendment the stack reads:

next ──► #3083 (select + PackageVersions, ready)
          └── #3085 PR-A (provider trait + adapter; bench expected flat vs next)
                └── this PR (PR-B, driver + four graph helpers; bench expected flat vs PR-A)
                      └── PR-C (cutover + runtime tunings + dead-code-allowance removals; bench-gated win)

#3084 stays open as the integrated-form reference baseline until the three split PRs all land in next. Its bench-comment record (the 2.45s ± 0.13 p1_resolve, vCtx 18.1K on the same ant-design workspace) is the target the cutover PR's bench needs to match.

What lands here

crates/ruborist/src/resolver/demand/driver.rs (new, +701) is the BFS loop and the fetch pipeline. It owns the per-run ManifestState cache + waiters + failures store from #3079, the FetchQueues scheduler with single-flight de-dup from #3080, and the pure per-edge decision step select_edge → EdgeStep / WaitKey / FetchPlan from #3083. The pipeline pumps the trait's job stream through a FuturesUnordered of tokio::task::JoinHandles — the multi-core spawn shape that gives the resolver native fan-out (tokio::spawn on native, the single-threaded tokio::task::spawn_local on wasm, gated by the same #[cfg_attr] on the trait's Send + Sync bound from PR-A). The apply_fetch_result glue feeds resolved manifests back into the graph through the four new graph-building helpers in builder.rs (see below). The handle_processed wrapper fires the existing BuildEvent::Resolved and BuildEvent::Failed events so progress receivers see the same event surface across both the legacy two-phase pipeline and the demand pipeline once the cutover wires the entry-points across.

crates/ruborist/src/resolver/builder.rs (+134 / -3) gains four pub helpers next to the existing process_dependency, plus a new pub(crate) async entry function. None of them have a caller in this PR outside the demand driver itself; the legacy entry chain (build_deps, build_deps_with_receiver, build_deps_with_config, resolve, resolve_with_options) keeps its existing R: RegistryClient signatures and its existing preload-then-BFS body, so the active runtime path is unchanged from PR-A and the orphan-comment annotations on gather_preload_deps / run_preload_phase / run_bfs_phase aren't added yet (the cutover is what makes them orphans).

New top-level item in `builder.rs`	Purpose
`pub(crate) async fn build_deps_with_config_output<R, E>(graph, registry, config, receiver) -> Result<ResolverManifestCache, ResolveError<R::Error>> where R: ManifestProvider, R::Error: Send, E: EventReceiver`	The demand-side entry the cutover PR will route both the public `build_deps_with_config` (unit-returning entry) and `api::build_lockfile`'s host-side cache-export step through. Body is the `tracing::info!` line, the `run_main_loop_bfs` call, and the `Ok(manifest_cache)`. Carries `#[allow(dead_code)]` with a one-line comment naming PR-C as the caller. The three new `use` statements at the top of the file (`CoreVersionManifest` joining the existing `crate::model::manifest::NodeManifest` brace-list, the new `use crate::resolver::demand::{ResolverManifestCache, run_main_loop_bfs}`, and `ManifestProvider` joining `crate::service::ProjectCacheData`) are the only edits to existing lines in this file. The signatures of the legacy entries are untouched.
`pub(crate) fn try_reuse_dependency(graph, parent, edge, resolved_name, resolved_version) -> Option<EdgeStitched>`	Hits the graph's existing-node index before the driver issues a fetch, so repeat references to the same `(name, resolved-version)` share one graph node rather than producing parallel duplicates. Extracted from the inline logic in `process_dependency` so the demand path can call it without going back through the legacy entry chain.
`pub fn process_dependency_with_resolved(graph, parent, edge, resolved_name, manifest, dev_deps, peer_deps, source) -> NodeIndex`	The edge-resolution tail: given a resolved `CoreVersionManifest` for a child of `parent`, creates or reuses the dependent node, attaches the dependency-typed edge, sets the resolution mode flags. The `pub` (rather than `pub(crate)`) visibility matches the existing `pub` on `process_dependency` — same audience surface, same level of abstraction.
`pub(crate) fn chain_err<E>(parent_chain, err) -> ResolveError<E>`	The error-chaining wrapper that lifts a per-job `RegistryError` from the provider's job stream into the resolver's existing `ResolveError::WithChain`, so the CLI's chain-aware error renderer (`pm::format_print`) still gets the "outermost name → inner name → cause" causality string when the demand path's job-batch reports a failed manifest fetch, matching the equivalent error shape the legacy single-fetch path emits.
`pub(crate) async fn handle_resolved_registry_manifest<R, E>(graph, registry, receiver, parent, edge, resolved_name, manifest, state) -> Result<HandleResult, ResolveError<R::Error>> where R: ManifestProvider, R::Error: Send, E: EventReceiver`	The "we have a resolved version manifest, integrate it" step. Caches the manifest under both the requested spec and the resolved-version key in the per-run `ManifestState::version`, builds the resolved-version's edges into the queue of work the driver hasn't dispatched yet, fires `BuildEvent::Resolved(name, version)` on the receiver.

crates/ruborist/src/resolver/demand/mod.rs (+13 / -5) declares pub mod driver and re-exports driver::run_main_loop_bfs and state::ResolverManifestCache at the demand-module level so builder.rs's new entry function can name them as crate::resolver::demand::{run_main_loop_bfs, ResolverManifestCache}. The crate-level re-exports already exist through the lib.rs's pub mod resolver.

crates/ruborist/src/resolver/demand/queue.rs (+5 / -3) tweaks the visibility of the FetchKey and FetchDone types so the driver can name them outside the queue module, and adjusts the FetchPriority enum's variant ordering so the driver's pop_next walks them in the right BFS-first order (semver-resolved-version > full-manifest > already-cached-version-lookup).

crates/ruborist/src/resolver/demand/state.rs (+6 / -0) adds one annotation: #[allow(dead_code)] on the ResolverManifestCache.entries field. The field is written by ManifestState::into_resolver_cache() at the end of run_main_loop_bfs and read by ProjectCacheData::from_resolved(cache.entries) in the cutover PR's api.rs edit. In this PR it's written-only (the only writer path is reachable through PR-B's dead-coded build_deps_with_config_output), so the warning would fire under the strict cargo clippy --all-targets -- -D warnings invocation CLAUDE.md mandates. The annotation has a 5-line comment naming the cutover PR's api.rs change as the reader and pointing at the matching annotation on PR-A's ProjectCacheData bridges in service/cache.rs. Both annotations come off in PR-C when the writer-chain joins the reader-chain.

The cutover-dependent test ignore

The driver's #[cfg(test)] mod tests block (in driver.rs itself) holds the unit-test scaffolding the driver needs: a MockRegistryClient that simulates a registry's full-manifest and version-manifest responses, a CountingRegistry wrapper that delegates to an inner mock but increments an AtomicUsize for every request_manifests call that involves a given package name (so single-flight de-duplication for that name is observable), and the create_*_manifest helpers. One test in this block — test_non_semver_exact_version_extract_single_flight — sets up a root with two siblings a and b both depending on the same exact-version shared@1.2.3, drives a full resolve(pkg, &counting_registry) through the resolver crate's public entry, and asserts that the counter for shared@1.2.3 reads exactly 1 (the demand driver's single-flight de-dup folded the two waiters onto one provider job).

In this PR the public resolve entry still goes through the legacy RegistryClient::fetch_version_manifest path — the entry-point bound flip from R: RegistryClient to R: ManifestProvider is the cutover's payload — so the counter (which lives on the CountingRegistry's ManifestProvider impl) never increments and the assertion's "left: 0, right: 1" panic is exactly the failure mode you'd see on a clean checkout of this PR. The test gets #[ignore = "exercises the demand-driver pipeline, wired in the cutover PR"] with a multi-line comment explaining the reason; the cutover PR removes the #[ignore] once resolve delegates to build_deps_with_config_output. The other tests in the driver's test module exercise the loop's invariants directly (state transitions, the schedule-and-pump cycle, the waiter wake-up after a fetch completes) without going through the public entry, and they pass under this PR's intermediate state. cargo test -p utoo-ruborist --lib on 63a18ad6 reports 181 passed; 0 failed; 1 ignored.

Bench expectation

benchmark label is on. Since the driver is in the binary but unreferenced from any entry-point that the bench harness drives — the harness runs utoo install on the ant-design workspace and times the resolve phase, and utoo install's resolver entry is the legacy api::build_lockfile → build_deps_with_config → run_preload_phase then run_bfs_phase chain — the active runtime path is the same as PR-A's. The expected p1_resolve and vCtx numbers therefore match PR-A's flat-vs-next baseline (≈ 2.86s wall and ≈ 47K vCtx on ant-design after PR-A's amendment dropped the runtime-tuning regression). The full ≈ 2.45s / ≈ 18K vCtx win — what the integrated-form #3084's bench measured — surfaces at the cutover PR's bench gate.

Local hygiene on the rebased tree

After the cross-stack rebase (PR-B's single commit 0360cd64 got replayed onto the amended PR-A tip c16893ec, yielding the current 63a18ad6 whose parent is c16893ec, with no rebase conflicts because PR-B touches builder.rs and demand/* while PR-A's amendment dropped service/http.rs + pm/util/user_config.rs + pm/helper/ruborist_context.rs — disjoint file sets):

cargo check -p utoo-ruborist -p utoo-pm: clean.
cargo clippy -p utoo-ruborist -p utoo-pm --all-targets -- -D warnings --no-deps (the strict form CLAUDE.md mandates after a Rust edit): clean.
cargo test -p utoo-ruborist --lib: 181 passed; 0 failed; 1 ignored (the #[ignore]'d single-flight test mentioned above).
cargo fmt -p utoo-ruborist -p utoo-pm --check: clean.

The workspace-wide cargo check still hits a pre-existing turbopack_nodejs::{EcmascriptBuildNodeChunk, EcmascriptBuildNodeEntryChunk} "not in the root" error in pack-api/src/webpack_stats.rs — that's the next.js submodule's drift, present on next and every branch in the repo, unrelated to this stack.

Refs #3028, #3083, #3084, base PR-A #3085. The cutover PR will reference all four.

🤖 Generated with Claude Code

gemini-code-assist

Code Review

This pull request introduces the BFS driver loop (run_main_loop_bfs) for demand-driven dependency resolution, staging the orchestration of manifest fetches, cache state, and graph building. In builder.rs, the handle_resolved_registry_manifest function contains an inefficiency where it calls process_dependency when an override is present, leading to redundant registry requests and lookups. It is recommended to directly resolve the override spec and integrate it into the graph.

gemini-code-assist · 2026-05-28T04:18:36Z

+    let processed = if graph
+        .check_override(parent, &edge.name, Some(&resolved.version))
+        .is_some()
+    {
+        process_dependency(graph, registry, parent, edge, config)
+            .await
+            .map_err(|inner| chain_err(graph, parent, edge, inner))?
+    } else {
+        receiver.on_event(BuildEvent::PackageResolved((&*resolved.manifest).into()));
+        process_dependency_with_resolved(graph, parent, edge, &resolved, config)
+    };


In handle_resolved_registry_manifest, calling process_dependency when an override is present is highly inefficient. Since process_dependency is designed to resolve a dependency from scratch, it will re-resolve the original spec (which we already resolved to get manifest!) before checking the override and resolving the override spec. This results in redundant registry requests and lookups.

Instead, we should directly resolve the override_spec using resolve_registry_dep and then integrate it into the graph using process_dependency_with_resolved.

Suggested change

let processed = if graph

.check_override(parent, &edge.name, Some(&resolved.version))

.is_some()

{

process_dependency(graph, registry, parent, edge, config)

.await

.map_err(|inner| chain_err(graph, parent, edge, inner))?

} else {

receiver.on_event(BuildEvent::PackageResolved((&*resolved.manifest).into()));

process_dependency_with_resolved(graph, parent, edge, &resolved, config)

};

let processed = if let Some(override_spec) = graph.check_override(parent, &edge.name, Some(&resolved.version)) {

let override_resolved = match resolve_registry_dep(

registry,

&edge.name,

&override_spec,

&edge.edge_type,

)

.await

.map_err(|inner| chain_err(graph, parent, edge, inner))? {

Some(r) => r,

None => resolved,

};

receiver.on_event(BuildEvent::PackageResolved((&*override_resolved.manifest).into()));

process_dependency_with_resolved(graph, parent, edge, &override_resolved, config)

} else {

receiver.on_event(BuildEvent::PackageResolved((&*resolved.manifest).into()));

process_dependency_with_resolved(graph, parent, edge, &resolved, config)

};

Lands the demand-driven BFS resolver loop on top of the `ManifestProvider` trait from the preceding PR in the stack. The driver and its graph-building helpers exist as dead code in this PR — the entry-point switch that points `api.rs` and `builder`'s public `build_deps_*` / `resolve_*` chain at them is the third PR. Same dead-code-staging idiom as the earlier `state.rs` (#3079) and `select.rs` (#3083) splits. What lands here, file by file: * `resolver/demand/driver.rs` (new, ~700 lines): the `run_main_loop_bfs` entry — owns the per-run `ManifestState` (the cache + waiters + failures store from #3079) and the `FetchQueues` scheduler (the push/pop/complete state machine from #3080), pumps the `ManifestProvider` job stream through a `FuturesUnordered` of `tokio::task::JoinHandle`s (the multi-core spawn that gives the resolver native fan-out — `tokio::spawn` on native targets, the single-threaded `tokio::task::spawn_local` on wasm via the `#[cfg_attr]` toggle on the trait's `Send + Sync` bound). The `apply_fetch_result` glue feeds resolved manifests back into the graph through the new helpers in `builder.rs` (see below); the `select_edge` decision step from #3083 picks the next action per-edge (cache hit, version-cache hit, wait on an in-flight job, fail). The `handle_processed` wrapper around the graph-mutation step emits the existing `BuildEvent::Resolved` / `Failed` so progress receivers don't see a discontinuity once the cutover lands. A `#[cfg(test)]` module at the bottom holds the driver's unit-test scaffolding (`MockRegistryClient`, `CountingRegistry` wrapper for the single-flight property, the `create_*_manifest` helpers). One of those tests — `test_non_semver_exact_version_extract_single_flight` — is `#[ignore]`d in this PR with a reason string: it asserts on the `ManifestProvider` job count produced by a full `resolve(pkg, registry)` pipeline, which still routes through the legacy `RegistryClient::fetch_version_manifest` path in this PR. The cutover PR removes the `#[ignore]` once `resolve` is pointed at the demand driver. The other driver tests cover the loop's invariants in isolation (state transitions, waiter wake-up, schedule fairness) and pass under PR-B. * `resolver/demand/mod.rs`, `resolver/demand/queue.rs`: the small re-export and visibility adjustments to expose `run_main_loop_bfs` and `ResolverManifestCache` at the `crate::resolver::demand` level so `builder.rs` can name them, and the queue's `FetchKey` /`FetchDone` types in the shape the driver consumes. * `resolver/demand/state.rs`: a single attribute — `#[allow(dead_code)]` on the `ResolverManifestCache.entries` field. The driver writes the field via `ManifestState::into_resolver_cache()` at the end of each run; the reader is `ProjectCacheData::from_resolved` in the cutover PR's `api.rs` edit. Mirrors the symmetric annotation on the `ProjectCacheData` bridges in `service/cache.rs` from PR-A — both annotations come off when the entry-point switch wires the writer-chain to the reader-chain in PR-C. * `resolver/builder.rs`: four new graph-building helpers extracted from `process_dependency`'s internal logic so the driver can reuse them without going back through the legacy entry-points, plus the new `pub(crate) async fn build_deps_with_config_output` that wraps the demand loop with the existing tracing + receiver wiring and returns the `ResolverManifestCache` the host needs to persist: - `pub(crate) fn try_reuse_dependency(...)`: hits the graph's existing-node index before issuing a fetch, so repeat references to the same `(name, resolved-version)` share one node. - `pub fn process_dependency_with_resolved(...)`: the edge-resolution tail that runs once a manifest is in hand — creates or reuses the dependent node, attaches the edge, forwards the resolution mode flags. - `pub(crate) fn chain_err(...)`: lifts a `RegistryError` from the provider's job stream into the resolver's `ResolveError::WithChain` so the CLI's chain-aware error renderer still gets the parent → child causality string when the demand path fails the same way the legacy path used to. - `pub(crate) async fn handle_resolved_registry_manifest(...)`: the integration point between a resolved `CoreVersionManifest` and the graph — caches under both the spec and the resolved version (so later lookups by either key hit memory), spawns the dependent-edge collection, fires `BuildEvent::Resolved`. All four are reachable only from the driver in this PR; the legacy `process_dependency` keeps its inline form and the legacy entry chain (`build_deps` / `build_deps_with_*` / `resolve` / `resolve_with_options`) keeps its old `R: RegistryClient` signatures. The new `build_deps_with_config_output` is the demand-side entry the cutover PR will route `build_deps_with_config` and `api.rs` through; it carries an `#[allow(dead_code)]` for this interim state with a one-line comment naming the next PR as its caller. The three import-line tweaks at the top of `builder.rs` — `CoreVersionManifest` joining the `crate::model::manifest` brace-group, the new `use` of `ResolverManifestCache` and `run_main_loop_bfs` from `crate::resolver::demand`, and `ManifestProvider` joining the `crate::service` brace-group — are the only edits to existing lines in this file. The orphaned preload-era functions (`gather_preload_deps`, `run_preload_phase`, `run_bfs_phase`) keep their existing signatures and live call paths — the cutover PR is what `#[allow(dead_code)]`-annotates them and the cleanup PR after the cutover deletes them. The benchmark label is on this PR so the bench gate runs. Because the active resolver pipeline is unchanged in this PR (`resolve` still calls preload-then-BFS through the legacy `RegistryClient` interface), the expected bench numbers match PR-A on the standard npmjs workspace. The full `p1_resolve ≈ 2.4s / vCtx ≈ 18K` win shows up in PR-C alongside the entry-point flip. Part 2/3 of the #3084 split. Refs #3028 #3083 #3084 #3085 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-28T05:58:25Z

📊 pm-bench-phases · `68716f1` · linux (`ubuntu-latest`)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	8.98s	0.16s	9.99s	9.98s	747M	319.4K
utoo-next	8.12s	0.15s	10.35s	11.88s	862M	118.4K
utoo-npm	8.74s	1.23s	10.46s	11.94s	845M	123.8K
utoo	0.00s	0.00s	10.00s	10.74s	866M	121.7K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	15.1K	17.5K	1.11G	6M	1.77G	1.66G	1M
utoo-next	108.6K	79.1K	1.08G	5M	1.62G	1.61G	2M
utoo-npm	122.8K	83.2K	1.08G	5M	1.62G	1.61G	2M
utoo	116.3K	63.9K	1.08G	5M	1.62G	-	-

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	2.05s	0.16s	4.17s	1.11s	524M	166.8K
utoo-next	4.27s	2.41s	5.25s	1.78s	624M	80.8K
utoo-npm	4.60s	2.71s	5.50s	2.13s	617M	75.1K
utoo	2.84s	0.02s	5.27s	1.81s	620M	89.1K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	8.0K	4.9K	205M	3M	109M	-	1M
utoo-next	48.7K	70.4K	202M	2M	7M	3M	2M
utoo-npm	74.4K	93.2K	202M	2M	7M	3M	2M
utoo	48.2K	69.3K	203M	2M	7M	3M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	6.48s	0.18s	5.89s	9.50s	592M	190.6K
utoo-next	6.57s	1.39s	4.85s	10.65s	475M	59.2K
utoo-npm	7.32s	1.42s	4.92s	10.62s	483M	57.6K
utoo	6.25s	1.42s	4.84s	10.57s	538M	62.4K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	4.2K	6.1K	934M	3M	1.67G	1.67G	1M
utoo-next	105.7K	49.2K	904M	3M	1.61G	1.61G	2M
utoo-npm	101.8K	49.4K	904M	3M	1.61G	1.61G	2M
utoo	100.2K	50.8K	904M	3M	1.61G	1.61G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	3.30s	0.05s	0.20s	2.44s	135M	32.8K
utoo-next	2.34s	0.09s	0.53s	3.81s	80M	18.4K
utoo-npm	2.28s	0.15s	0.50s	3.79s	80M	18.6K
utoo	2.22s	0.13s	0.48s	3.84s	79M	18.3K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	249	25	5M	40K	1.82G	1.66G	1M
utoo-next	42.3K	19.4K	4K	10K	1.61G	1.61G	2M
utoo-npm	41.1K	18.6K	4K	9K	1.61G	1.61G	2M
utoo	42.1K	19.0K	7K	10K	1.61G	1.61G	2M

npmmirror.com: no output captured.

The cutover that closes the three-PR demand-resolver stack from #3028 (the original perf spec) and the integrated-form #3084 (the all-in-one landing at commit 4833992 that this split layers). Flips the public entry-point chain in `crates/ruborist/src/resolver/builder.rs` and the host-side wiring in `crates/ruborist/src/service/api.rs` from the legacy `RegistryClient`-bound two-phase preload-then-BFS resolver to the demand-driven BFS loop landed by PR-B (#3086) on the `ManifestProvider` trait landed by PR-A (#3085). Bundles the two runtime tunings whose payoff is the demand driver's single-flight de-duplication of concurrent same-package fetches — the four-reqwest-client round-robin pool in `service/http.rs` that fans HTTP keep-alive connections across the Cloudflare edge IPs the npmjs.org CNAME chain resolves to, and the resolver-side cap of 256 for non-semver npmjs from a new `get_resolver_manifests_concurrency_limit` function in `pm/util/user_config.rs` wired into the `BuildDepsConfig.concurrency` field via the matching swap in `pm/helper/ruborist_context.rs::Context::new`. Lands the dead-code-allowance removals on the cross-PR scaffolding — PR-A's `service/cache.rs::impl ProjectCacheData::{resolved_manifests, from_resolved}` block-level `#[allow(dead_code)]` with its "Bridges between the on-disk project cache and the demand resolver's neutral (name, spec, manifest) tuples. Wired up in the resolver cutover PR." comment, and PR-B's `resolver/demand/state.rs::ResolverManifestCache.entries` field-level `#[allow(dead_code)]` with its matching staging comment — because the wiring this PR introduces is the reader-chain those staged allowances pointed at. Lands the orphan annotations on the three now-unreachable preload-era functions — `#[allow(dead_code)]` plus the one-line "Retired preload path: orphaned now that resolution runs through the demand driver. Kept compiling until the cleanup PR removes the preload module." comment above each of `gather_preload_deps`, `run_preload_phase`, and `run_bfs_phase` in `builder.rs` — the three functions whose only caller was the legacy `build_deps_with_config` body that this PR rewrites. And lands the matching `#[ignore]`-removal on the demand-driver's `test_non_semver_exact_version_extract_single_flight` test in `crates/ruborist/src/resolver/demand/driver.rs`: PR-B parked it with `#[ignore = "exercises the demand-driver pipeline, wired in the cutover PR"]` and a 7-line comment block explaining the counter-stays-at-zero failure mode of the assertion under the legacy `RegistryClient`-bound `resolve` entry; once the cutover flips `resolve` and `resolve_with_options` to `where R: ManifestProvider, R::Error: Send` and `build_deps_with_config`'s body delegates to PR-B's `build_deps_with_config_output` which in turn calls `run_main_loop_bfs`, the `CountingRegistry` wrapper around the inner `MockRegistryClient` does see its `ManifestProvider::request_manifests` invocation counter increment exactly once for the `shared@1.2.3` exact-version specifier the two sibling root-deps `a` and `b` both name (the demand driver's `FetchQueues` scheduler is the single-flight de-duplicator), and the assertion `assert_eq!(shared_version_jobs.load( Ordering::Relaxed), 1)` lands at `left: 1, right: 1` and the test passes — taking the lib-test count from PR-B's 181-passing-and-1-ignored to the cutover's 182-passing-and-0-ignored. The eight-file commit body, in the order the changes chain through the call graph from the host's `api::build_lockfile` entry down to the registry adapter: * `crates/ruborist/src/service/api.rs` — the host-side cutover. The `BuildDepsConfig` builder chain in `build_lockfile` picks up a new `.with_project_cache(project_cache)` link that forwards the host's warm-cache snapshot (loaded by the separately-already-existing `pm::util::project_cache` reader from the on-disk `~/.utoo/cache/projects/<hash>.json` sidecar the `utoo install` writes after a successful resolve) into the resolver, so the demand pipeline's `ManifestState::seeded(project_cache.unwrap_or_default().resolved_manifests())` pre-seeding step skips the manifest-fetch round-trip for the dependencies whose specs the lockfile already names a resolved-version-and-manifest for — the same warm-cache "skip the network for unchanged deps" shortcut the legacy preload phase used to take through the existing `RegistryClient::cache_version_manifest` shim, but driven through the demand pipeline's neutral `(name, spec, manifest)` tuple format that `ProjectCacheData::resolved_manifests` (PR-A's bridge, the inverse of `from_resolved`, now live) produces from the on-disk per-package-spec-to-version map. The host's post-resolve manifest-cache export — which the legacy code did by iterating `registry.cache().export_version_manifests()` and parsing each "@scope/name@spec" key back into the `(name, spec, version)` triple via the now-unused `crate::model::util::parse_package_spec` import for the sidecar-writeback — becomes the one-line `let project_cache = ProjectCacheData::from_resolved(build_deps_with_config_output( graph, registry, config, receiver).await?.entries);` (PR-A's matching `from_resolved` bridge, now live). The `parse_package_spec` import drops with the legacy loop. The `build_deps_with_config` call that the legacy code made (unit-returning) becomes `build_deps_with_config_output` (the `Result<ResolverManifestCache, _>`-returning variant PR-B added) since the host needs the resolved-manifest stream for the sidecar write. * `crates/ruborist/src/resolver/builder.rs` — the in-resolver cutover. The `pub struct BuildDepsConfig` definition picks up the `pub project_cache: Option<ProjectCacheData>` field with the doc-comment "Host-provided project cache used to seed the resolver-owned manifest cache. Consumed by the demand mainloop; the preload path ignores it." (the preload-path-ignores-it wording is a forward-reference to the cleanup PR that deletes the preload path entirely); the matching `Default::default()` initializes the field to `None`; the `impl BuildDepsConfig` block gains the builder-method `pub fn with_project_cache(mut self, project_cache: Option<ProjectCacheData>) -> Self { self.project_cache = project_cache; self }` for the api.rs call-site to chain. The five public entry-point functions (`pub async fn build_deps`, `pub async fn build_deps_with_receiver`, `pub async fn build_deps_with_config`, `pub async fn resolve`, `pub async fn resolve_with_options`) have their type-parameter bound flipped from the angle-bracket form `<R: RegistryClient>` to the bare `<R>` with the trailing where-clause `where R: ManifestProvider, R::Error: Send` (the `Send` clause is needed because the demand driver's `tokio::spawn`-shaped job futures are bounded as `Send` on native targets, and the trait's adapter implementor `UnifiedRegistry` from PR-A satisfies it because all its inner state — the `Arc<Store>`, the `String` registry_url, the `bool` supports_semver flag — is `Send + Sync`). The body of `build_deps_with_config` is rewritten from the legacy three-line "log 'Starting dependency tree build', call `run_preload_phase(graph, registry, &config, receiver)`, call `run_bfs_phase(graph, registry, &config, receiver)?`, return `Ok(())`" sequence to "log 'Starting demand dependency build' with the new `peer_deps` and `concurrency` format args (the `skip_preload` field is dropped from the format string because the demand path has no preload phase to skip), call `build_deps_with_config_output(graph, registry, config, receiver).await?`, return `Ok(())` (the `_output` variant's `ResolverManifestCache` return is discarded by the unit-returning `build_deps_with_config`'s contract; the host that wants the cache calls the `_output` variant directly per the api.rs hunk above)." The three preload-era helpers (`fn gather_preload_deps(graph, peer_deps) -> Vec<(String, String)>` at line 194 of bA, `async fn run_preload_phase<R: RegistryClient, E: EventReceiver>(...)` at line 760 of bA, `async fn run_bfs_phase<R: RegistryClient, E: EventReceiver>(...)` at line 812 of bA — line numbers refer to the perf/pm-resolver-select tip which both PR-A and PR-B's `builder.rs` retain) get the orphan annotation: a two-line `// Retired preload path: orphaned now that resolution runs through the demand driver. Kept compiling until the cleanup PR removes the preload module.` comment block above each function's doc-comment-or-signature, with the `#[allow(dead_code)]` attribute as the next line — the pattern matches the staged-dead-code annotations on the cross-PR scaffolding that come off in this same commit, reading like a small choreography where one set of allowances retires while the other set of allowances is redeemed by the wiring landing. The `#[allow(dead_code)]` that PR-B placed on the `pub(crate) async fn build_deps_with_config_output` also comes off because the rewritten body of `build_deps_with_config` (and the api.rs cutover) are now both in-crate callers of it. The three new `use` statements at the top of the file — `use crate::model::manifest::{CoreVersionManifest, NodeManifest};` joining `CoreVersionManifest` into the existing brace-list, `use crate::resolver::demand::{ ResolverManifestCache, run_main_loop_bfs};` as a new line, `use crate::service::{ManifestProvider, ProjectCacheData};` joining `ManifestProvider` into the existing brace-list — are the same three import-line tweaks PR-B added at the top of the file when the driver's helpers were introduced; they remain unchanged in this commit (no further import additions on top of PR-B's). * `crates/ruborist/src/service/http.rs` — the connection pool. The single `static CLIENT: LazyLock<reqwest::Client> = LazyLock::new(|| { ClientBuilder::new()...build() });` becomes the array form `const CLIENT_POOL_SIZE: usize = 4; static CLIENTS: LazyLock<[reqwest::Client; CLIENT_POOL_SIZE]> = LazyLock::new(|| { std::array::from_fn(|_| ClientBuilder::new()...build() .unwrap()) });` with a `static NEXT_CLIENT_IDX: std::sync::atomic::AtomicUsize = std::sync::atomic::AtomicUsize::new(0);` round-robin counter. The public function `pub fn get_client() -> &'static reqwest::Client` keeps its signature unchanged so all existing callers throughout the crate graph (the tarball-fetch path in pm's installer, the registry-discovery fetch in the workspace bootstrap, the resolver-side fetch in the manifest adapter) get the new pooled behavior transparently — the body becomes `&CLIENTS[NEXT_CLIENT_IDX.fetch_add(1, std::sync::atomic::Ordering::Relaxed) % CLIENT_POOL_SIZE]`. The four-way fan-out lets the demand driver's `FuturesUnordered`-shaped batched manifest fetches each land on a different reqwest::Client which DNS-resolves `registry.npmjs.org` to a different Cloudflare edge IP (the edge-IP rotation that the public-npmjs front-end does for any client opening a fresh TLS handshake), bypassing the per-edge-IP rate-limiter that the single-client version's keep-alive pool concentrated all the traffic at and that was the PR-A-pre-amend-bench's documented +14 % wall regression vs the next-baseline on the resolve phase. The pool's inter-handshake-state independence (each member has its own connection-pool, its own DNS cache, its own HTTP/2-or-1 multiplex state, its own auth credentials chain) makes the fan-out transparent to the resolver's job-fan-out — the resolver issues N jobs through `request_manifests`, the adapter's body in `service/registry/provider.rs` calls `get_client()` for each job to get the next pool member, the four pool members independently maintain their connection-pool state. The aggregate throughput ceiling is N×(per-member-connection-cap) rather than 1×, which is the headroom the demand driver's higher in-flight cap (256 vs the legacy 64) uses to amortize its single-flight savings into wall-clock improvement. * `crates/pm/src/util/user_config.rs` — the resolver-side concurrency knob. A new private helper `fn resolver_manifest_concurrency_limit(default_cap: usize, is_npm_default_registry: bool, semver_mode_override: Option<bool>) -> usize` whose branch table reads: "if the registry is the public npmjs.org default and the semver-mode is off (the abbreviated-metadata "versions-only" code path where the demand driver fetches the versions list for a package once and then the per-version manifests on demand, deduping the same-version request across multiple parent edges through `FetchQueues`'s single-flight), return 256; if the user has an explicit cap in `~/.utoorc.json::resolver.manifest_concurrency_limit`, return that user-specified value preserving the intent (overriding both the default-256-for-npmjs and the default-default for other registries); if the semver-mode is on (the spec's resolver-discriminator the registry's full-manifest range-spec resolution takes, which is a per-package round-trip the demand driver's `select_full_manifest` step issues one of per range-spec encountered, with no opportunity to amortize because each range-spec is unique to a single edge's request) return the default-default for the registry, which for npmjs.org is the existing `get_manifests_concurrency_limit`'s default of 64 (carried over so the semver-mode behaves the same as the legacy resolver on tarball-side concurrency); for non-npmjs registries return the default-default in all branches". The public wrapper `pub async fn get_resolver_manifests_concurrency_limit() -> usize` reads the active registry's hostname from the user-config's `npm.registry` field (or the env var), reads the semver-mode flag from the user-config's `resolver.semver_mode` field, calls the helper. The three branch-coverage unit tests (one per branch, all guarded with `#[tokio::test]` and using `tempfile` to isolate the test's view of `~/.utoorc.json`) document the wiring. The existing `pub async fn get_manifests_concurrency_limit() -> usize` for the tarball-side concurrency (the cap that the installer's parallel-extract step honors when staging package tarballs into `node_modules`) keeps its existing 64-default shape — the tarball-side and the resolver-side are semantically separate, with the tarball-side bounded by disk-I/O and the npmjs-tarball-CDN's per-IP rate, and the resolver-side bounded by the metadata-endpoint's per-IP-after-Cloudflare-edge-fan-out throughput. * `crates/pm/src/helper/ruborist_context.rs` — the wire. The single-line change in the `use crate::util::user_config::{...}` brace-list import changes `get_manifests_concurrency_limit` to `get_resolver_manifests_concurrency_limit`, and the single-line change at the call site in `Context::new`'s body that previously read `concurrency: get_manifests_concurrency_limit().await,` now reads `concurrency: get_resolver_manifests_concurrency_limit().await,` so the `Context.concurrency` field — which the `BuildDepsConfig::with_concurrency` builder reads in `api::build_lockfile`'s config assembly — carries the 256-for-npmjs cap rather than the 64-for-tarballs cap the legacy version did. Net file delta is +2 / -2 (one line of import-list change, one line of call-site change), or as `diff --stat` reports it under brace-list-token-counting, `+1 / -1 = 0` non-trivial change because the line is in a multi-line brace block whose individual entries are token-counted and the position of the renamed entry shifts in alphabetical order so the file's line-count is unchanged. * `crates/ruborist/src/service/cache.rs` — the dead-code-allowance comes off PR-A's `impl ProjectCacheData::{resolved_manifests, from_resolved}` block. The two methods are now in-crate-callsite-live because the api.rs hunk above calls `from_resolved` for the post-resolve sidecar write and the `BuildDepsConfig::with_project_cache`'s downstream consumption in the demand driver's `ManifestState::seeded` calls `resolved_manifests` for the warm-cache read. The 5-line comment block "// Bridges between the on-disk project cache and the demand resolver's neutral `(name, spec, manifest)` tuples. Wired up in the resolver cutover PR — staged here so the trait + adapter PRs stay self-contained." and the `#[allow(dead_code)]` attribute right above the `impl ProjectCacheData {` opening brace both come off. The `pub(crate) fn resolved_manifests(&self) -> Vec<(String, String, Arc<CoreVersionManifest>)>` and the `pub(crate) fn from_resolved(entries: Vec<(String, String, Arc<CoreVersionManifest>)>) -> Self` method bodies — the ones that fold the on-disk per-package spec→version-→manifest map into the flat neutral-tuple vec and the inverse fold — are unchanged in this commit; just the allowance attribute comes off. * `crates/ruborist/src/resolver/demand/state.rs` — the symmetric dead-code-allowance comes off PR-B's `pub(crate) struct ResolverManifestCache { ... entries ... }`'s `entries` field. The 6-line comment "// The resolver writes this on each run via `ManifestState::into_resolver_cache`; the reader is `ProjectCacheData::from_resolved` in the cutover PR's `api.rs` edit. Staged write-only here so the driver lands ahead of the entry-point switch — see the matching `#[allow(dead_code)]` on `ProjectCacheData`'s bridges in `service/cache.rs` introduced by the preceding (provider) PR." block and the `#[allow(dead_code)]` attribute right above the `pub(crate) entries:` line both come off. The field itself — `pub(crate) entries: Vec<(String, String, Arc<CoreVersionManifest>)>` — is unchanged; just the allowance comes off, because the api.rs cutover's read of `manifest_cache.entries` for the `ProjectCacheData::from_resolved` call is the now-live in-crate reader the staging comment named. * `crates/ruborist/src/resolver/demand/driver.rs` — the `#[ignore]` comes off the demand driver's single-flight test. The 7-line "// The `resolve` entry in `builder` still routes through the legacy `RegistryClient::fetch_version_manifest` path in this PR — the cutover that points it at the demand driver lives in the follow-up PR in this stack. Until then the `CountingRegistry`'s `ManifestProvider`-side job counter stays at zero (the legacy path bypasses it), so the single-flight assertion below has nothing to count. The cutover PR removes this `#[ignore]` alongside flipping the entry-point bounds." comment block and the `#[ignore = "exercises the demand-driver pipeline, wired in the cutover PR"]` attribute, both above the `#[tokio::test]` attribute above the `async fn test_non_semver_exact_version_extract_single_flight()` signature, both come off. The test body is unchanged. The `#[tokio::test]` and the `async fn` stay. The rest of the driver — the `MockRegistryClient` scaffolding, the `CountingRegistry` wrapper that decorates an inner `RegistryClient` impl with an `AtomicUsize`-per-package-name request counter visible on the outer wrapper for the test assertions, the `create_version_manifest` and `create_full_manifest` helpers, the half-dozen other driver-internal tests (the queue-pop-ordering test, the waiter-wake-on-fetch-complete test, the state-transition-from-pending-to-cached test, the fetch-error-propagates-to-WithChain test, etc.) that cover the loop's invariants in isolation without going through the public `resolve` entry — is unchanged from PR-B's tip. `cargo test -p utoo-ruborist --lib` on this commit's tree reports `182 passed; 0 failed; 0 ignored` (the formerly-ignored one joins the passing list because the assertion now reads `left: 1, right: 1` instead of the legacy-path's `left: 0, right: 1`). The bench gate on this PR is the load-bearing verification of the entire stack's perf claim. Its target is the integrated-form #3084's bench number recorded at the workflow run `26551226857` on the integrated commit `48339925`: `utoo` p1_resolve `2.45s ± 0.13` and allocation-side `vCtx` `18.1K` on the standard `ant-design` workspace under the `pm-e2e-bench.yml` workflow's `bench-phases-linux` job's `p1_resolve` phase table row. Compared to the same workflow's `utoo-next` baseline column on the same comment — `2.86s ± 0.04` and `vCtx 47.4K` — the cutover's targeted improvement is the 14-percent-wall and the 2.6×-vCtx-reduction the perf spec promised. Compared to the same workflow's `bun` column at `1.95s ± 0.06` and `vCtx 8.0K`, the headroom the demand-pipeline-vs-bun gap leaves is the matter the post-cutover cleanup PR's warm-cache rationalization addresses (the cleanup deletes the `crate::resolver::preload` module entirely, the `UnifiedRegistry`'s stateful `OnceMap`-shaped inflight-tracking is replaced by the demand pipeline's own queue, the `BuildEvent::Preload*` event variants and their progress-receiver hooks retire, the `RegistryClient::cache_version_manifest` shim is dropped, the per-cleanup dead-code lint covers what remains). The cleanup is bench-tracked against this cutover's bench result as its baseline — the "the wall-clock and the vCtx held after the deletions removed the carry-along scaffolding" confirmation. The cleanup is the long-tail follow-up and does not block this cutover's review. The integrated-form #3084 (the PR Andrew has open on the `perf/pm-resolver-driver` branch at commit 4833992, the all-in-one form that this three-PR stack replaces with a layered equivalent) stays open as the bench reference until this three-PR stack lands in `next`. After the stack lands, #3084 closes as superseded by the same payload landed in three reviewable layers — its commit history at the integrated form's tip remains in the repository's reflog and on the GitHub PR's timeline as the historical record of the original integration that motivated the split. The two sibling bench gates in this stack — PR-A #3085's gate at the amended type-level-scaffolding sha `c16893ec` and PR-B #3086's gate at the rebased driver-landing sha `63a18ad6` — are the no-regression floor that this PR's win measures against. The expected outcome on each of the two sibling gates is "flat vs `utoo-next`" because the active runtime path in each is byte-identical to next (the trait surface in PR-A is unreferenced from any live call site, the demand driver in PR-B is in the binary but the entry-points still go through the legacy preload-and-BFS chain, the orphan-annotation-receiving preload-era functions are still the only callees of `build_deps_with_config`'s body). The cutover-side delta — what this PR introduces — is what flips the active path from the legacy resolver to the demand pipeline, lights up the runtime tunings, and gives the bench-machine's resolve-phase timer the integrated form's 2.45-second number to land at. Three-of-three of the #3084 split. The merge order across the stack is PR-A #3085 (the trait, no runtime delta), then PR-B #3086 (the driver, no runtime delta because the entry-points still bypass it), then this PR (the entry-point flip and the runtime tunings, the runtime delta that delivers the perf claim). GitHub's stacked-PR base-tracking auto-rebases each PR's base from "its stack predecessor's branch" to `next` as each predecessor merges. The bench gate fires on each push to each PR's branch independently — the bench comments on the three PRs end up appearing in chronological order of the bench machine's queue pull, which matches the stack order modulo the GitHub Actions runner's per-job timing. Also fixes a latent gap in the shared fetch retry layer that the cutover's higher concurrency surfaces. `service/fetch.rs`'s `classify_status` mapped every non-404/429/5xx HTTP status to `FetchError::Permanent` via a catch-all arm, so a `406 Not Acceptable` was never retried. npmjs's Cloudflare edge intermittently answers a manifest request with 406 under heavy concurrent fan-out — a different package each run, ~one per run, a transient content-negotiation hiccup rather than a real unsatisfiable-Accept (which would 406 every request). The legacy two-phase resolver's gentler 64-wide single-client fetch rarely tripped it; the demand driver's 256-wide four-client pool fan-out trips it on nearly every e2e run, failing `utoo install` on a single stray 406. Reclassify 406 as `Retryable` alongside 429 so the existing five-delay backoff absorbs it. The buggy catch-all predates this stack (it lives in the shared layer both paths use), but the cutover is what makes it user-visible, so the one-line fix rides with it. Refs #3028, #3083 (the select-and-state scaffolding already in next as the stack base), #3084 (the integrated-form reference, still open at 4833992 on the `perf/pm-resolver-driver` branch), #3085 (PR-A, the trait + adapter, draft at sha `c16893ec`), #3086 (PR-B, the driver + four graph helpers, draft at sha `63a18ad6`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 28, 2026

elrrrrrrr requested review from fireairforce, killagu, xusd320 and yuzheng14 May 28, 2026 04:16

gemini-code-assist Bot reviewed May 28, 2026

View reviewed changes

elrrrrrrr force-pushed the perf/pm-resolver-provider branch from c4430b2 to c16893e Compare May 28, 2026 05:34

elrrrrrrr force-pushed the perf/pm-resolver-driver-stage branch from 0360cd6 to 63a18ad Compare May 28, 2026 05:38

elrrrrrrr mentioned this pull request May 28, 2026

perf(pm): demand resolver — entry-point cutover + runtime tunings (3/3) #3087

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(pm): demand resolver — driver loop + graph helpers (2/3)#3086

perf(pm): demand resolver — driver loop + graph helpers (2/3)#3086
elrrrrrrr wants to merge 1 commit into
perf/pm-resolver-providerfrom
perf/pm-resolver-driver-stage

elrrrrrrr commented May 28, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 28, 2026

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elrrrrrrr commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The stack after the amendment

What lands here

The cutover-dependent test ignore

Bench expectation

Local hygiene on the rebased tree

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 28, 2026

📊 pm-bench-phases · 68716f1 · linux (ubuntu-latest)

npmjs.org

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

elrrrrrrr commented May 28, 2026 •

edited

Loading

📊 pm-bench-phases · `68716f1` · linux (`ubuntu-latest`)