Skip to content

perf(pm): demand resolver — driver loop + graph helpers (2/3)#3086

Draft
elrrrrrrr wants to merge 1 commit into
perf/pm-resolver-providerfrom
perf/pm-resolver-driver-stage
Draft

perf(pm): demand resolver — driver loop + graph helpers (2/3)#3086
elrrrrrrr wants to merge 1 commit into
perf/pm-resolver-providerfrom
perf/pm-resolver-driver-stage

Conversation

@elrrrrrrr
Copy link
Copy Markdown
Contributor

@elrrrrrrr elrrrrrrr commented May 28, 2026

Part 2/3 of the #3084 split — the demand-driven BFS resolver loop. Stacks on PR-A (#3085, the ManifestProvider trait + the registry-backed adapter). The cutover that points the public build_deps_* and resolve_* entry-points + api::build_lockfile at the demand pipeline, and lands the two runtime tunings the demand driver's single-flight makes pay off, is PR-C — which I'll open after both #3085 and this PR's bench gates are in.

The stack after the amendment

The original PR-A had bundled the HTTP-pool and the resolver-side concurrency knob alongside the trait. Bench on that earlier shape came in at a 14% wall-clock regression on p1_resolve vs utoo-next (3.27s vs 2.86s on the standard ant-design workspace) because those tunings only pay off on a single-flight pipeline (which the legacy resolver isn't) and overcommit the npmjs per-IP rate-limit when applied to the per-edge legacy fetcher. Per the lesson from the earlier multi-PR rebuilds (memory note: "split PRs by 'does it ship the perf' not just 'additive vs delete'"), the tunings moved to the cutover PR where they're paired with the entry-point flip that makes them sound. After that amendment the stack reads:

next ──► #3083 (select + PackageVersions, ready)
          └── #3085 PR-A (provider trait + adapter; bench expected flat vs next)
                └── this PR (PR-B, driver + four graph helpers; bench expected flat vs PR-A)
                      └── PR-C (cutover + runtime tunings + dead-code-allowance removals; bench-gated win)

#3084 stays open as the integrated-form reference baseline until the three split PRs all land in next. Its bench-comment record (the 2.45s ± 0.13 p1_resolve, vCtx 18.1K on the same ant-design workspace) is the target the cutover PR's bench needs to match.

What lands here

crates/ruborist/src/resolver/demand/driver.rs (new, +701) is the BFS loop and the fetch pipeline. It owns the per-run ManifestState cache + waiters + failures store from #3079, the FetchQueues scheduler with single-flight de-dup from #3080, and the pure per-edge decision step select_edgeEdgeStep / WaitKey / FetchPlan from #3083. The pipeline pumps the trait's job stream through a FuturesUnordered of tokio::task::JoinHandles — the multi-core spawn shape that gives the resolver native fan-out (tokio::spawn on native, the single-threaded tokio::task::spawn_local on wasm, gated by the same #[cfg_attr] on the trait's Send + Sync bound from PR-A). The apply_fetch_result glue feeds resolved manifests back into the graph through the four new graph-building helpers in builder.rs (see below). The handle_processed wrapper fires the existing BuildEvent::Resolved and BuildEvent::Failed events so progress receivers see the same event surface across both the legacy two-phase pipeline and the demand pipeline once the cutover wires the entry-points across.

crates/ruborist/src/resolver/builder.rs (+134 / -3) gains four pub helpers next to the existing process_dependency, plus a new pub(crate) async entry function. None of them have a caller in this PR outside the demand driver itself; the legacy entry chain (build_deps, build_deps_with_receiver, build_deps_with_config, resolve, resolve_with_options) keeps its existing R: RegistryClient signatures and its existing preload-then-BFS body, so the active runtime path is unchanged from PR-A and the orphan-comment annotations on gather_preload_deps / run_preload_phase / run_bfs_phase aren't added yet (the cutover is what makes them orphans).

New top-level item in builder.rs Purpose
pub(crate) async fn build_deps_with_config_output<R, E>(graph, registry, config, receiver) -> Result<ResolverManifestCache, ResolveError<R::Error>> where R: ManifestProvider, R::Error: Send, E: EventReceiver The demand-side entry the cutover PR will route both the public build_deps_with_config (unit-returning entry) and api::build_lockfile's host-side cache-export step through. Body is the tracing::info! line, the run_main_loop_bfs call, and the Ok(manifest_cache). Carries #[allow(dead_code)] with a one-line comment naming PR-C as the caller. The three new use statements at the top of the file (CoreVersionManifest joining the existing crate::model::manifest::NodeManifest brace-list, the new use crate::resolver::demand::{ResolverManifestCache, run_main_loop_bfs}, and ManifestProvider joining crate::service::ProjectCacheData) are the only edits to existing lines in this file. The signatures of the legacy entries are untouched.
pub(crate) fn try_reuse_dependency(graph, parent, edge, resolved_name, resolved_version) -> Option<EdgeStitched> Hits the graph's existing-node index before the driver issues a fetch, so repeat references to the same (name, resolved-version) share one graph node rather than producing parallel duplicates. Extracted from the inline logic in process_dependency so the demand path can call it without going back through the legacy entry chain.
pub fn process_dependency_with_resolved(graph, parent, edge, resolved_name, manifest, dev_deps, peer_deps, source) -> NodeIndex The edge-resolution tail: given a resolved CoreVersionManifest for a child of parent, creates or reuses the dependent node, attaches the dependency-typed edge, sets the resolution mode flags. The pub (rather than pub(crate)) visibility matches the existing pub on process_dependency — same audience surface, same level of abstraction.
pub(crate) fn chain_err<E>(parent_chain, err) -> ResolveError<E> The error-chaining wrapper that lifts a per-job RegistryError from the provider's job stream into the resolver's existing ResolveError::WithChain, so the CLI's chain-aware error renderer (pm::format_print) still gets the "outermost name → inner name → cause" causality string when the demand path's job-batch reports a failed manifest fetch, matching the equivalent error shape the legacy single-fetch path emits.
pub(crate) async fn handle_resolved_registry_manifest<R, E>(graph, registry, receiver, parent, edge, resolved_name, manifest, state) -> Result<HandleResult, ResolveError<R::Error>> where R: ManifestProvider, R::Error: Send, E: EventReceiver The "we have a resolved version manifest, integrate it" step. Caches the manifest under both the requested spec and the resolved-version key in the per-run ManifestState::version, builds the resolved-version's edges into the queue of work the driver hasn't dispatched yet, fires BuildEvent::Resolved(name, version) on the receiver.

crates/ruborist/src/resolver/demand/mod.rs (+13 / -5) declares pub mod driver and re-exports driver::run_main_loop_bfs and state::ResolverManifestCache at the demand-module level so builder.rs's new entry function can name them as crate::resolver::demand::{run_main_loop_bfs, ResolverManifestCache}. The crate-level re-exports already exist through the lib.rs's pub mod resolver.

crates/ruborist/src/resolver/demand/queue.rs (+5 / -3) tweaks the visibility of the FetchKey and FetchDone types so the driver can name them outside the queue module, and adjusts the FetchPriority enum's variant ordering so the driver's pop_next walks them in the right BFS-first order (semver-resolved-version > full-manifest > already-cached-version-lookup).

crates/ruborist/src/resolver/demand/state.rs (+6 / -0) adds one annotation: #[allow(dead_code)] on the ResolverManifestCache.entries field. The field is written by ManifestState::into_resolver_cache() at the end of run_main_loop_bfs and read by ProjectCacheData::from_resolved(cache.entries) in the cutover PR's api.rs edit. In this PR it's written-only (the only writer path is reachable through PR-B's dead-coded build_deps_with_config_output), so the warning would fire under the strict cargo clippy --all-targets -- -D warnings invocation CLAUDE.md mandates. The annotation has a 5-line comment naming the cutover PR's api.rs change as the reader and pointing at the matching annotation on PR-A's ProjectCacheData bridges in service/cache.rs. Both annotations come off in PR-C when the writer-chain joins the reader-chain.

The cutover-dependent test ignore

The driver's #[cfg(test)] mod tests block (in driver.rs itself) holds the unit-test scaffolding the driver needs: a MockRegistryClient that simulates a registry's full-manifest and version-manifest responses, a CountingRegistry wrapper that delegates to an inner mock but increments an AtomicUsize for every request_manifests call that involves a given package name (so single-flight de-duplication for that name is observable), and the create_*_manifest helpers. One test in this block — test_non_semver_exact_version_extract_single_flight — sets up a root with two siblings a and b both depending on the same exact-version shared@1.2.3, drives a full resolve(pkg, &counting_registry) through the resolver crate's public entry, and asserts that the counter for shared@1.2.3 reads exactly 1 (the demand driver's single-flight de-dup folded the two waiters onto one provider job).

In this PR the public resolve entry still goes through the legacy RegistryClient::fetch_version_manifest path — the entry-point bound flip from R: RegistryClient to R: ManifestProvider is the cutover's payload — so the counter (which lives on the CountingRegistry's ManifestProvider impl) never increments and the assertion's "left: 0, right: 1" panic is exactly the failure mode you'd see on a clean checkout of this PR. The test gets #[ignore = "exercises the demand-driver pipeline, wired in the cutover PR"] with a multi-line comment explaining the reason; the cutover PR removes the #[ignore] once resolve delegates to build_deps_with_config_output. The other tests in the driver's test module exercise the loop's invariants directly (state transitions, the schedule-and-pump cycle, the waiter wake-up after a fetch completes) without going through the public entry, and they pass under this PR's intermediate state. cargo test -p utoo-ruborist --lib on 63a18ad6 reports 181 passed; 0 failed; 1 ignored.

Bench expectation

benchmark label is on. Since the driver is in the binary but unreferenced from any entry-point that the bench harness drives — the harness runs utoo install on the ant-design workspace and times the resolve phase, and utoo install's resolver entry is the legacy api::build_lockfilebuild_deps_with_configrun_preload_phase then run_bfs_phase chain — the active runtime path is the same as PR-A's. The expected p1_resolve and vCtx numbers therefore match PR-A's flat-vs-next baseline (≈ 2.86s wall and ≈ 47K vCtx on ant-design after PR-A's amendment dropped the runtime-tuning regression). The full ≈ 2.45s / ≈ 18K vCtx win — what the integrated-form #3084's bench measured — surfaces at the cutover PR's bench gate.

Local hygiene on the rebased tree

After the cross-stack rebase (PR-B's single commit 0360cd64 got replayed onto the amended PR-A tip c16893ec, yielding the current 63a18ad6 whose parent is c16893ec, with no rebase conflicts because PR-B touches builder.rs and demand/* while PR-A's amendment dropped service/http.rs + pm/util/user_config.rs + pm/helper/ruborist_context.rs — disjoint file sets):

  • cargo check -p utoo-ruborist -p utoo-pm: clean.
  • cargo clippy -p utoo-ruborist -p utoo-pm --all-targets -- -D warnings --no-deps (the strict form CLAUDE.md mandates after a Rust edit): clean.
  • cargo test -p utoo-ruborist --lib: 181 passed; 0 failed; 1 ignored (the #[ignore]'d single-flight test mentioned above).
  • cargo fmt -p utoo-ruborist -p utoo-pm --check: clean.

The workspace-wide cargo check still hits a pre-existing turbopack_nodejs::{EcmascriptBuildNodeChunk, EcmascriptBuildNodeEntryChunk} "not in the root" error in pack-api/src/webpack_stats.rs — that's the next.js submodule's drift, present on next and every branch in the repo, unrelated to this stack.

Refs #3028, #3083, #3084, base PR-A #3085. The cutover PR will reference all four.

🤖 Generated with Claude Code

@elrrrrrrr elrrrrrrr added A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR labels May 28, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the BFS driver loop (run_main_loop_bfs) for demand-driven dependency resolution, staging the orchestration of manifest fetches, cache state, and graph building. In builder.rs, the handle_resolved_registry_manifest function contains an inefficiency where it calls process_dependency when an override is present, leading to redundant registry requests and lookups. It is recommended to directly resolve the override spec and integrate it into the graph.

Comment on lines +1099 to +1109
let processed = if graph
.check_override(parent, &edge.name, Some(&resolved.version))
.is_some()
{
process_dependency(graph, registry, parent, edge, config)
.await
.map_err(|inner| chain_err(graph, parent, edge, inner))?
} else {
receiver.on_event(BuildEvent::PackageResolved((&*resolved.manifest).into()));
process_dependency_with_resolved(graph, parent, edge, &resolved, config)
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In handle_resolved_registry_manifest, calling process_dependency when an override is present is highly inefficient. Since process_dependency is designed to resolve a dependency from scratch, it will re-resolve the original spec (which we already resolved to get manifest!) before checking the override and resolving the override spec. This results in redundant registry requests and lookups.

Instead, we should directly resolve the override_spec using resolve_registry_dep and then integrate it into the graph using process_dependency_with_resolved.

Suggested change
let processed = if graph
.check_override(parent, &edge.name, Some(&resolved.version))
.is_some()
{
process_dependency(graph, registry, parent, edge, config)
.await
.map_err(|inner| chain_err(graph, parent, edge, inner))?
} else {
receiver.on_event(BuildEvent::PackageResolved((&*resolved.manifest).into()));
process_dependency_with_resolved(graph, parent, edge, &resolved, config)
};
let processed = if let Some(override_spec) = graph.check_override(parent, &edge.name, Some(&resolved.version)) {
let override_resolved = match resolve_registry_dep(
registry,
&edge.name,
&override_spec,
&edge.edge_type,
)
.await
.map_err(|inner| chain_err(graph, parent, edge, inner))? {
Some(r) => r,
None => resolved,
};
receiver.on_event(BuildEvent::PackageResolved((&*override_resolved.manifest).into()));
process_dependency_with_resolved(graph, parent, edge, &override_resolved, config)
} else {
receiver.on_event(BuildEvent::PackageResolved((&*resolved.manifest).into()));
process_dependency_with_resolved(graph, parent, edge, &resolved, config)
};

@elrrrrrrr elrrrrrrr force-pushed the perf/pm-resolver-provider branch from c4430b2 to c16893e Compare May 28, 2026 05:34
Lands the demand-driven BFS resolver loop on top of the
`ManifestProvider` trait from the preceding PR in the stack. The
driver and its graph-building helpers exist as dead code in this PR
— the entry-point switch that points `api.rs` and `builder`'s
public `build_deps_*` / `resolve_*` chain at them is the third
PR. Same dead-code-staging idiom as the earlier `state.rs` (#3079)
and `select.rs` (#3083) splits.

What lands here, file by file:

* `resolver/demand/driver.rs` (new, ~700 lines): the `run_main_loop_bfs`
  entry — owns the per-run `ManifestState` (the cache + waiters +
  failures store from #3079) and the `FetchQueues` scheduler (the
  push/pop/complete state machine from #3080), pumps the
  `ManifestProvider` job stream through a `FuturesUnordered` of
  `tokio::task::JoinHandle`s (the multi-core spawn that gives the
  resolver native fan-out — `tokio::spawn` on native targets, the
  single-threaded `tokio::task::spawn_local` on wasm via the
  `#[cfg_attr]` toggle on the trait's `Send + Sync` bound). The
  `apply_fetch_result` glue feeds resolved manifests back into the
  graph through the new helpers in `builder.rs` (see below); the
  `select_edge` decision step from #3083 picks the next action
  per-edge (cache hit, version-cache hit, wait on an in-flight job,
  fail). The `handle_processed` wrapper around the
  graph-mutation step emits the existing `BuildEvent::Resolved` /
  `Failed` so progress receivers don't see a discontinuity once
  the cutover lands. A `#[cfg(test)]` module at the bottom holds
  the driver's unit-test scaffolding (`MockRegistryClient`,
  `CountingRegistry` wrapper for the single-flight property, the
  `create_*_manifest` helpers). One of those tests —
  `test_non_semver_exact_version_extract_single_flight` — is
  `#[ignore]`d in this PR with a reason string: it asserts on the
  `ManifestProvider` job count produced by a full
  `resolve(pkg, registry)` pipeline, which still routes through
  the legacy `RegistryClient::fetch_version_manifest` path in this
  PR. The cutover PR removes the `#[ignore]` once `resolve` is
  pointed at the demand driver. The other driver tests cover the
  loop's invariants in isolation (state transitions, waiter wake-up,
  schedule fairness) and pass under PR-B.

* `resolver/demand/mod.rs`, `resolver/demand/queue.rs`: the small
  re-export and visibility adjustments to expose `run_main_loop_bfs`
  and `ResolverManifestCache` at the `crate::resolver::demand`
  level so `builder.rs` can name them, and the queue's `FetchKey`
  /`FetchDone` types in the shape the driver consumes.

* `resolver/demand/state.rs`: a single attribute — `#[allow(dead_code)]`
  on the `ResolverManifestCache.entries` field. The driver writes
  the field via `ManifestState::into_resolver_cache()` at the end
  of each run; the reader is `ProjectCacheData::from_resolved` in
  the cutover PR's `api.rs` edit. Mirrors the symmetric annotation
  on the `ProjectCacheData` bridges in `service/cache.rs` from
  PR-A — both annotations come off when the entry-point switch
  wires the writer-chain to the reader-chain in PR-C.

* `resolver/builder.rs`: four new graph-building helpers extracted
  from `process_dependency`'s internal logic so the driver can
  reuse them without going back through the legacy entry-points,
  plus the new `pub(crate) async fn build_deps_with_config_output`
  that wraps the demand loop with the existing tracing + receiver
  wiring and returns the `ResolverManifestCache` the host needs to
  persist:

  - `pub(crate) fn try_reuse_dependency(...)`: hits the graph's
    existing-node index before issuing a fetch, so repeat references
    to the same `(name, resolved-version)` share one node.
  - `pub fn process_dependency_with_resolved(...)`: the
    edge-resolution tail that runs once a manifest is in hand —
    creates or reuses the dependent node, attaches the edge,
    forwards the resolution mode flags.
  - `pub(crate) fn chain_err(...)`: lifts a `RegistryError` from
    the provider's job stream into the resolver's
    `ResolveError::WithChain` so the CLI's chain-aware error
    renderer still gets the parent → child causality string when
    the demand path fails the same way the legacy path used to.
  - `pub(crate) async fn handle_resolved_registry_manifest(...)`:
    the integration point between a resolved `CoreVersionManifest`
    and the graph — caches under both the spec and the resolved
    version (so later lookups by either key hit memory), spawns the
    dependent-edge collection, fires `BuildEvent::Resolved`.

  All four are reachable only from the driver in this PR; the
  legacy `process_dependency` keeps its inline form and the
  legacy entry chain (`build_deps` / `build_deps_with_*` /
  `resolve` / `resolve_with_options`) keeps its old
  `R: RegistryClient` signatures. The new
  `build_deps_with_config_output` is the demand-side entry the
  cutover PR will route `build_deps_with_config` and `api.rs`
  through; it carries an `#[allow(dead_code)]` for this interim
  state with a one-line comment naming the next PR as its caller.

  The three import-line tweaks at the top of `builder.rs` —
  `CoreVersionManifest` joining the `crate::model::manifest`
  brace-group, the new `use` of `ResolverManifestCache` and
  `run_main_loop_bfs` from `crate::resolver::demand`, and
  `ManifestProvider` joining the `crate::service` brace-group —
  are the only edits to existing lines in this file. The orphaned
  preload-era functions (`gather_preload_deps`, `run_preload_phase`,
  `run_bfs_phase`) keep their existing signatures and live call
  paths — the cutover PR is what `#[allow(dead_code)]`-annotates
  them and the cleanup PR after the cutover deletes them.

The benchmark label is on this PR so the bench gate runs. Because
the active resolver pipeline is unchanged in this PR (`resolve`
still calls preload-then-BFS through the legacy
`RegistryClient` interface), the expected bench numbers match
PR-A on the standard npmjs workspace. The full
`p1_resolve ≈ 2.4s / vCtx ≈ 18K` win shows up in PR-C alongside
the entry-point flip.

Part 2/3 of the #3084 split.

Refs #3028 #3083 #3084 #3085

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@elrrrrrrr elrrrrrrr force-pushed the perf/pm-resolver-driver-stage branch from 0360cd6 to 63a18ad Compare May 28, 2026 05:38
@github-actions
Copy link
Copy Markdown

📊 pm-bench-phases · 68716f1 · linux (ubuntu-latest)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM wall ±σ user sys RSS pgMinor
bun 8.98s 0.16s 9.99s 9.98s 747M 319.4K
utoo-next 8.12s 0.15s 10.35s 11.88s 862M 118.4K
utoo-npm 8.74s 1.23s 10.46s 11.94s 845M 123.8K
utoo 0.00s 0.00s 10.00s 10.74s 866M 121.7K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 15.1K 17.5K 1.11G 6M 1.77G 1.66G 1M
utoo-next 108.6K 79.1K 1.08G 5M 1.62G 1.61G 2M
utoo-npm 122.8K 83.2K 1.08G 5M 1.62G 1.61G 2M
utoo 116.3K 63.9K 1.08G 5M 1.62G - -

p1_resolve

PM wall ±σ user sys RSS pgMinor
bun 2.05s 0.16s 4.17s 1.11s 524M 166.8K
utoo-next 4.27s 2.41s 5.25s 1.78s 624M 80.8K
utoo-npm 4.60s 2.71s 5.50s 2.13s 617M 75.1K
utoo 2.84s 0.02s 5.27s 1.81s 620M 89.1K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 8.0K 4.9K 205M 3M 109M - 1M
utoo-next 48.7K 70.4K 202M 2M 7M 3M 2M
utoo-npm 74.4K 93.2K 202M 2M 7M 3M 2M
utoo 48.2K 69.3K 203M 2M 7M 3M 2M

p3_cold_install

PM wall ±σ user sys RSS pgMinor
bun 6.48s 0.18s 5.89s 9.50s 592M 190.6K
utoo-next 6.57s 1.39s 4.85s 10.65s 475M 59.2K
utoo-npm 7.32s 1.42s 4.92s 10.62s 483M 57.6K
utoo 6.25s 1.42s 4.84s 10.57s 538M 62.4K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 4.2K 6.1K 934M 3M 1.67G 1.67G 1M
utoo-next 105.7K 49.2K 904M 3M 1.61G 1.61G 2M
utoo-npm 101.8K 49.4K 904M 3M 1.61G 1.61G 2M
utoo 100.2K 50.8K 904M 3M 1.61G 1.61G 2M

p4_warm_link

PM wall ±σ user sys RSS pgMinor
bun 3.30s 0.05s 0.20s 2.44s 135M 32.8K
utoo-next 2.34s 0.09s 0.53s 3.81s 80M 18.4K
utoo-npm 2.28s 0.15s 0.50s 3.79s 80M 18.6K
utoo 2.22s 0.13s 0.48s 3.84s 79M 18.3K
PM vCtx iCtx netRX netTX cache node_mod lock
bun 249 25 5M 40K 1.82G 1.66G 1M
utoo-next 42.3K 19.4K 4K 10K 1.61G 1.61G 2M
utoo-npm 41.1K 18.6K 4K 9K 1.61G 1.61G 2M
utoo 42.1K 19.0K 7K 10K 1.61G 1.61G 2M

npmmirror.com: no output captured.

elrrrrrrr added a commit that referenced this pull request May 28, 2026
The cutover that closes the three-PR demand-resolver stack from
#3028 (the original perf spec) and the integrated-form #3084 (the
all-in-one landing at commit 4833992 that this split layers).
Flips the public entry-point chain in
`crates/ruborist/src/resolver/builder.rs` and the host-side wiring
in `crates/ruborist/src/service/api.rs` from the legacy
`RegistryClient`-bound two-phase preload-then-BFS resolver to the
demand-driven BFS loop landed by PR-B (#3086) on the
`ManifestProvider` trait landed by PR-A (#3085). Bundles the two
runtime tunings whose payoff is the demand driver's single-flight
de-duplication of concurrent same-package fetches — the
four-reqwest-client round-robin pool in `service/http.rs` that
fans HTTP keep-alive connections across the Cloudflare edge IPs
the npmjs.org CNAME chain resolves to, and the resolver-side
cap of 256 for non-semver npmjs from a new `get_resolver_manifests_concurrency_limit`
function in `pm/util/user_config.rs` wired into the
`BuildDepsConfig.concurrency` field via the matching swap in
`pm/helper/ruborist_context.rs::Context::new`. Lands the
dead-code-allowance removals on the cross-PR scaffolding —
PR-A's `service/cache.rs::impl ProjectCacheData::{resolved_manifests,
from_resolved}` block-level `#[allow(dead_code)]` with its
"Bridges between the on-disk project cache and the demand
resolver's neutral (name, spec, manifest) tuples. Wired up in
the resolver cutover PR." comment, and PR-B's
`resolver/demand/state.rs::ResolverManifestCache.entries`
field-level `#[allow(dead_code)]` with its matching staging
comment — because the wiring this PR introduces is the
reader-chain those staged allowances pointed at. Lands the
orphan annotations on the three now-unreachable preload-era
functions — `#[allow(dead_code)]` plus the one-line "Retired
preload path: orphaned now that resolution runs through the
demand driver. Kept compiling until the cleanup PR removes the
preload module." comment above each of `gather_preload_deps`,
`run_preload_phase`, and `run_bfs_phase` in `builder.rs` — the
three functions whose only caller was the legacy
`build_deps_with_config` body that this PR rewrites. And lands
the matching `#[ignore]`-removal on the demand-driver's
`test_non_semver_exact_version_extract_single_flight` test in
`crates/ruborist/src/resolver/demand/driver.rs`: PR-B parked
it with `#[ignore = "exercises the demand-driver pipeline, wired
in the cutover PR"]` and a 7-line comment block explaining the
counter-stays-at-zero failure mode of the assertion under the
legacy `RegistryClient`-bound `resolve` entry; once the cutover
flips `resolve` and `resolve_with_options` to `where R:
ManifestProvider, R::Error: Send` and `build_deps_with_config`'s
body delegates to PR-B's `build_deps_with_config_output` which
in turn calls `run_main_loop_bfs`, the `CountingRegistry`
wrapper around the inner `MockRegistryClient` does see its
`ManifestProvider::request_manifests` invocation counter
increment exactly once for the `shared@1.2.3` exact-version
specifier the two sibling root-deps `a` and `b` both name (the
demand driver's `FetchQueues` scheduler is the single-flight
de-duplicator), and the assertion `assert_eq!(shared_version_jobs.load(
Ordering::Relaxed), 1)` lands at `left: 1, right: 1` and the
test passes — taking the lib-test count from PR-B's
181-passing-and-1-ignored to the cutover's 182-passing-and-0-ignored.

The eight-file commit body, in the order the changes
chain through the call graph from the host's
`api::build_lockfile` entry down to the registry adapter:

* `crates/ruborist/src/service/api.rs` — the host-side cutover.
  The `BuildDepsConfig` builder chain in `build_lockfile` picks
  up a new `.with_project_cache(project_cache)` link that
  forwards the host's warm-cache snapshot (loaded by the
  separately-already-existing `pm::util::project_cache` reader
  from the on-disk `~/.utoo/cache/projects/<hash>.json` sidecar
  the `utoo install` writes after a successful resolve) into
  the resolver, so the demand pipeline's
  `ManifestState::seeded(project_cache.unwrap_or_default().resolved_manifests())`
  pre-seeding step skips the manifest-fetch round-trip for the
  dependencies whose specs the lockfile already names a
  resolved-version-and-manifest for — the same warm-cache
  "skip the network for unchanged deps" shortcut the legacy
  preload phase used to take through the existing
  `RegistryClient::cache_version_manifest` shim, but driven
  through the demand pipeline's neutral `(name, spec, manifest)`
  tuple format that `ProjectCacheData::resolved_manifests`
  (PR-A's bridge, the inverse of `from_resolved`, now live)
  produces from the on-disk per-package-spec-to-version map.
  The host's post-resolve manifest-cache export — which the
  legacy code did by iterating
  `registry.cache().export_version_manifests()` and parsing
  each "@scope/name@spec" key back into the `(name, spec,
  version)` triple via the now-unused
  `crate::model::util::parse_package_spec` import for the
  sidecar-writeback — becomes the one-line `let project_cache
  = ProjectCacheData::from_resolved(build_deps_with_config_output(
  graph, registry, config, receiver).await?.entries);` (PR-A's
  matching `from_resolved` bridge, now live). The
  `parse_package_spec` import drops with the legacy loop. The
  `build_deps_with_config` call that the legacy code made
  (unit-returning) becomes `build_deps_with_config_output`
  (the `Result<ResolverManifestCache, _>`-returning variant
  PR-B added) since the host needs the resolved-manifest
  stream for the sidecar write.

* `crates/ruborist/src/resolver/builder.rs` — the in-resolver
  cutover. The `pub struct BuildDepsConfig` definition picks
  up the `pub project_cache: Option<ProjectCacheData>` field
  with the doc-comment "Host-provided project cache used to
  seed the resolver-owned manifest cache. Consumed by the
  demand mainloop; the preload path ignores it." (the
  preload-path-ignores-it wording is a forward-reference to
  the cleanup PR that deletes the preload path entirely); the
  matching `Default::default()` initializes the field to
  `None`; the `impl BuildDepsConfig` block gains the
  builder-method
  `pub fn with_project_cache(mut self, project_cache:
  Option<ProjectCacheData>) -> Self { self.project_cache =
  project_cache; self }` for the api.rs call-site to chain.
  The five public entry-point functions (`pub async fn
  build_deps`, `pub async fn build_deps_with_receiver`, `pub
  async fn build_deps_with_config`, `pub async fn resolve`,
  `pub async fn resolve_with_options`) have their type-parameter
  bound flipped from the angle-bracket form `<R:
  RegistryClient>` to the bare `<R>` with the trailing where-clause
  `where R: ManifestProvider, R::Error: Send` (the `Send` clause
  is needed because the demand driver's `tokio::spawn`-shaped
  job futures are bounded as `Send` on native targets, and the
  trait's adapter implementor `UnifiedRegistry` from PR-A
  satisfies it because all its inner state — the `Arc<Store>`,
  the `String` registry_url, the `bool` supports_semver flag —
  is `Send + Sync`). The body of `build_deps_with_config` is
  rewritten from the legacy three-line "log
  'Starting dependency tree build', call
  `run_preload_phase(graph, registry, &config, receiver)`,
  call `run_bfs_phase(graph, registry, &config,
  receiver)?`, return `Ok(())`" sequence to "log
  'Starting demand dependency build' with the new
  `peer_deps` and `concurrency` format args (the
  `skip_preload` field is dropped from the format string
  because the demand path has no preload phase to skip),
  call `build_deps_with_config_output(graph, registry,
  config, receiver).await?`, return `Ok(())` (the
  `_output` variant's `ResolverManifestCache` return is
  discarded by the unit-returning `build_deps_with_config`'s
  contract; the host that wants the cache calls the
  `_output` variant directly per the api.rs hunk above)."
  The three preload-era helpers
  (`fn gather_preload_deps(graph, peer_deps) ->
  Vec<(String, String)>` at line 194 of bA, `async fn
  run_preload_phase<R: RegistryClient, E: EventReceiver>(...)`
  at line 760 of bA, `async fn run_bfs_phase<R:
  RegistryClient, E: EventReceiver>(...)` at line 812 of
  bA — line numbers refer to the perf/pm-resolver-select
  tip which both PR-A and PR-B's `builder.rs` retain) get
  the orphan annotation: a two-line `// Retired preload
  path: orphaned now that resolution runs through the
  demand driver. Kept compiling until the cleanup PR
  removes the preload module.` comment block above each
  function's doc-comment-or-signature, with the
  `#[allow(dead_code)]` attribute as the next line — the
  pattern matches the staged-dead-code annotations on the
  cross-PR scaffolding that come off in this same commit,
  reading like a small choreography where one set of
  allowances retires while the other set of allowances is
  redeemed by the wiring landing. The
  `#[allow(dead_code)]` that PR-B placed on the
  `pub(crate) async fn build_deps_with_config_output`
  also comes off because the rewritten body of
  `build_deps_with_config` (and the api.rs cutover) are
  now both in-crate callers of it. The three new `use`
  statements at the top of the file —
  `use crate::model::manifest::{CoreVersionManifest,
  NodeManifest};` joining `CoreVersionManifest` into the
  existing brace-list, `use crate::resolver::demand::{
  ResolverManifestCache, run_main_loop_bfs};` as a new
  line, `use crate::service::{ManifestProvider,
  ProjectCacheData};` joining `ManifestProvider` into the
  existing brace-list — are the same three import-line
  tweaks PR-B added at the top of the file when the
  driver's helpers were introduced; they remain unchanged
  in this commit (no further import additions on top of
  PR-B's).

* `crates/ruborist/src/service/http.rs` — the connection
  pool. The single `static CLIENT: LazyLock<reqwest::Client>
  = LazyLock::new(|| { ClientBuilder::new()...build() });`
  becomes the array form `const CLIENT_POOL_SIZE: usize =
  4; static CLIENTS: LazyLock<[reqwest::Client;
  CLIENT_POOL_SIZE]> = LazyLock::new(|| {
  std::array::from_fn(|_| ClientBuilder::new()...build()
  .unwrap()) });` with a `static NEXT_CLIENT_IDX:
  std::sync::atomic::AtomicUsize =
  std::sync::atomic::AtomicUsize::new(0);` round-robin
  counter. The public function `pub fn get_client() ->
  &'static reqwest::Client` keeps its signature
  unchanged so all existing callers throughout the crate
  graph (the tarball-fetch path in pm's installer, the
  registry-discovery fetch in the workspace bootstrap,
  the resolver-side fetch in the manifest adapter) get
  the new pooled behavior transparently — the body
  becomes `&CLIENTS[NEXT_CLIENT_IDX.fetch_add(1,
  std::sync::atomic::Ordering::Relaxed) %
  CLIENT_POOL_SIZE]`. The four-way fan-out lets the
  demand driver's `FuturesUnordered`-shaped batched
  manifest fetches each land on a different
  reqwest::Client which DNS-resolves
  `registry.npmjs.org` to a different Cloudflare edge IP
  (the edge-IP rotation that the public-npmjs front-end
  does for any client opening a fresh TLS handshake),
  bypassing the per-edge-IP rate-limiter that the
  single-client version's keep-alive pool concentrated
  all the traffic at and that was the
  PR-A-pre-amend-bench's documented +14 % wall regression
  vs the next-baseline on the resolve phase. The pool's
  inter-handshake-state independence (each member has
  its own connection-pool, its own DNS cache, its own
  HTTP/2-or-1 multiplex state, its own auth credentials
  chain) makes the fan-out transparent to the resolver's
  job-fan-out — the resolver issues N jobs through
  `request_manifests`, the adapter's body in
  `service/registry/provider.rs` calls `get_client()`
  for each job to get the next pool member, the four
  pool members independently maintain their
  connection-pool state. The aggregate throughput
  ceiling is N×(per-member-connection-cap) rather than
  1×, which is the headroom the demand driver's higher
  in-flight cap (256 vs the legacy 64) uses to amortize
  its single-flight savings into wall-clock improvement.

* `crates/pm/src/util/user_config.rs` — the resolver-side
  concurrency knob. A new private helper
  `fn resolver_manifest_concurrency_limit(default_cap:
  usize, is_npm_default_registry: bool,
  semver_mode_override: Option<bool>) -> usize` whose
  branch table reads: "if the registry is the public
  npmjs.org default and the semver-mode is off
  (the abbreviated-metadata "versions-only" code path
  where the demand driver fetches the versions list for
  a package once and then the per-version manifests on
  demand, deduping the same-version request across
  multiple parent edges through `FetchQueues`'s
  single-flight), return 256; if the user has an
  explicit cap in `~/.utoorc.json::resolver.manifest_concurrency_limit`,
  return that user-specified value preserving the
  intent (overriding both the default-256-for-npmjs and
  the default-default for other registries); if the
  semver-mode is on (the spec's resolver-discriminator
  the registry's full-manifest range-spec resolution
  takes, which is a per-package round-trip the demand
  driver's `select_full_manifest` step issues one of per
  range-spec encountered, with no opportunity to
  amortize because each range-spec is unique to a single
  edge's request) return the default-default for the
  registry, which for npmjs.org is the existing
  `get_manifests_concurrency_limit`'s default of 64
  (carried over so the semver-mode behaves the same as
  the legacy resolver on tarball-side concurrency);
  for non-npmjs registries return the default-default
  in all branches". The public wrapper
  `pub async fn get_resolver_manifests_concurrency_limit()
  -> usize` reads the active registry's hostname from
  the user-config's `npm.registry` field (or the env
  var), reads the semver-mode flag from the user-config's
  `resolver.semver_mode` field, calls the helper. The
  three branch-coverage unit tests (one per branch, all
  guarded with `#[tokio::test]` and using `tempfile` to
  isolate the test's view of `~/.utoorc.json`) document
  the wiring. The existing `pub async fn
  get_manifests_concurrency_limit() -> usize` for the
  tarball-side concurrency (the cap that the installer's
  parallel-extract step honors when staging package
  tarballs into `node_modules`) keeps its existing
  64-default shape — the tarball-side and the
  resolver-side are semantically separate, with the
  tarball-side bounded by disk-I/O and the
  npmjs-tarball-CDN's per-IP rate, and the resolver-side
  bounded by the metadata-endpoint's
  per-IP-after-Cloudflare-edge-fan-out throughput.

* `crates/pm/src/helper/ruborist_context.rs` — the wire.
  The single-line change in the `use
  crate::util::user_config::{...}` brace-list import
  changes `get_manifests_concurrency_limit` to
  `get_resolver_manifests_concurrency_limit`, and the
  single-line change at the call site in
  `Context::new`'s body that previously read
  `concurrency: get_manifests_concurrency_limit().await,`
  now reads `concurrency:
  get_resolver_manifests_concurrency_limit().await,` so
  the `Context.concurrency` field — which the
  `BuildDepsConfig::with_concurrency` builder reads in
  `api::build_lockfile`'s config assembly — carries the
  256-for-npmjs cap rather than the 64-for-tarballs cap
  the legacy version did. Net file delta is +2 / -2 (one
  line of import-list change, one line of call-site
  change), or as `diff --stat` reports it under
  brace-list-token-counting, `+1 / -1 = 0` non-trivial
  change because the line is in a multi-line brace
  block whose individual entries are token-counted and
  the position of the renamed entry shifts in
  alphabetical order so the file's line-count is
  unchanged.

* `crates/ruborist/src/service/cache.rs` — the
  dead-code-allowance comes off PR-A's
  `impl ProjectCacheData::{resolved_manifests, from_resolved}`
  block. The two methods are now in-crate-callsite-live
  because the api.rs hunk above calls `from_resolved`
  for the post-resolve sidecar write and the
  `BuildDepsConfig::with_project_cache`'s downstream
  consumption in the demand driver's
  `ManifestState::seeded` calls `resolved_manifests` for
  the warm-cache read. The 5-line comment block "//
  Bridges between the on-disk project cache and the
  demand resolver's neutral `(name, spec, manifest)`
  tuples. Wired up in the resolver cutover PR — staged
  here so the trait + adapter PRs stay self-contained."
  and the `#[allow(dead_code)]` attribute right above
  the `impl ProjectCacheData {` opening brace both come
  off. The `pub(crate) fn resolved_manifests(&self) ->
  Vec<(String, String, Arc<CoreVersionManifest>)>` and
  the `pub(crate) fn from_resolved(entries: Vec<(String,
  String, Arc<CoreVersionManifest>)>) -> Self` method
  bodies — the ones that fold the on-disk per-package
  spec→version-→manifest map into the flat neutral-tuple
  vec and the inverse fold — are unchanged in this
  commit; just the allowance attribute comes off.

* `crates/ruborist/src/resolver/demand/state.rs` — the
  symmetric dead-code-allowance comes off PR-B's
  `pub(crate) struct ResolverManifestCache { ...
  entries ... }`'s `entries` field. The 6-line comment
  "// The resolver writes this on each run via
  `ManifestState::into_resolver_cache`; the reader is
  `ProjectCacheData::from_resolved` in the cutover PR's
  `api.rs` edit. Staged write-only here so the driver
  lands ahead of the entry-point switch — see the
  matching `#[allow(dead_code)]` on `ProjectCacheData`'s
  bridges in `service/cache.rs` introduced by the
  preceding (provider) PR." block and the
  `#[allow(dead_code)]` attribute right above the
  `pub(crate) entries:` line both come off. The field
  itself — `pub(crate) entries: Vec<(String, String,
  Arc<CoreVersionManifest>)>` — is unchanged; just the
  allowance comes off, because the api.rs cutover's
  read of `manifest_cache.entries` for the
  `ProjectCacheData::from_resolved` call is the now-live
  in-crate reader the staging comment named.

* `crates/ruborist/src/resolver/demand/driver.rs` — the
  `#[ignore]` comes off the demand driver's single-flight
  test. The 7-line "// The `resolve` entry in `builder`
  still routes through the legacy
  `RegistryClient::fetch_version_manifest` path in this
  PR — the cutover that points it at the demand driver
  lives in the follow-up PR in this stack. Until then
  the `CountingRegistry`'s `ManifestProvider`-side job
  counter stays at zero (the legacy path bypasses it),
  so the single-flight assertion below has nothing to
  count. The cutover PR removes this `#[ignore]`
  alongside flipping the entry-point bounds." comment
  block and the `#[ignore = "exercises the
  demand-driver pipeline, wired in the cutover PR"]`
  attribute, both above the `#[tokio::test]` attribute
  above the `async fn
  test_non_semver_exact_version_extract_single_flight()`
  signature, both come off. The test body is unchanged.
  The `#[tokio::test]` and the `async fn` stay. The
  rest of the driver — the `MockRegistryClient`
  scaffolding, the `CountingRegistry` wrapper that
  decorates an inner `RegistryClient` impl with an
  `AtomicUsize`-per-package-name request counter
  visible on the outer wrapper for the test
  assertions, the `create_version_manifest` and
  `create_full_manifest` helpers, the half-dozen other
  driver-internal tests (the queue-pop-ordering test,
  the waiter-wake-on-fetch-complete test, the
  state-transition-from-pending-to-cached test, the
  fetch-error-propagates-to-WithChain test, etc.) that
  cover the loop's invariants in isolation without
  going through the public `resolve` entry — is
  unchanged from PR-B's tip. `cargo test -p
  utoo-ruborist --lib` on this commit's tree reports
  `182 passed; 0 failed; 0 ignored` (the formerly-ignored
  one joins the passing list because the assertion now
  reads `left: 1, right: 1` instead of the
  legacy-path's `left: 0, right: 1`).

The bench gate on this PR is the load-bearing
verification of the entire stack's perf claim. Its target
is the integrated-form #3084's bench number recorded at
the workflow run `26551226857` on the integrated commit
`48339925`: `utoo` p1_resolve `2.45s ± 0.13` and
allocation-side `vCtx` `18.1K` on the standard
`ant-design` workspace under the `pm-e2e-bench.yml`
workflow's `bench-phases-linux` job's `p1_resolve` phase
table row. Compared to the same workflow's `utoo-next`
baseline column on the same comment — `2.86s ± 0.04` and
`vCtx 47.4K` — the cutover's targeted improvement is the
14-percent-wall and the 2.6×-vCtx-reduction the perf
spec promised. Compared to the same workflow's `bun`
column at `1.95s ± 0.06` and `vCtx 8.0K`, the headroom
the demand-pipeline-vs-bun gap leaves is the matter the
post-cutover cleanup PR's warm-cache rationalization
addresses (the cleanup deletes the `crate::resolver::preload`
module entirely, the `UnifiedRegistry`'s stateful
`OnceMap`-shaped inflight-tracking is replaced by the
demand pipeline's own queue, the `BuildEvent::Preload*`
event variants and their progress-receiver hooks retire,
the `RegistryClient::cache_version_manifest` shim is
dropped, the per-cleanup dead-code lint covers what
remains). The cleanup is bench-tracked against this
cutover's bench result as its baseline — the "the
wall-clock and the vCtx held after the deletions
removed the carry-along scaffolding" confirmation. The
cleanup is the long-tail follow-up and does not block
this cutover's review.

The integrated-form #3084 (the PR Andrew has open on
the `perf/pm-resolver-driver` branch at commit
4833992, the all-in-one form that this three-PR stack
replaces with a layered equivalent) stays open as the
bench reference until this three-PR stack lands in
`next`. After the stack lands, #3084 closes as
superseded by the same payload landed in three
reviewable layers — its commit history at the integrated
form's tip remains in the repository's reflog and on the
GitHub PR's timeline as the historical record of the
original integration that motivated the split.

The two sibling bench gates in this stack — PR-A
#3085's gate at the amended type-level-scaffolding sha
`c16893ec` and PR-B #3086's gate at the rebased
driver-landing sha `63a18ad6` — are the no-regression
floor that this PR's win measures against. The
expected outcome on each of the two sibling gates is
"flat vs `utoo-next`" because the active runtime path
in each is byte-identical to next (the trait surface
in PR-A is unreferenced from any live call site, the
demand driver in PR-B is in the binary but the
entry-points still go through the legacy preload-and-BFS
chain, the orphan-annotation-receiving preload-era
functions are still the only callees of
`build_deps_with_config`'s body). The cutover-side
delta — what this PR introduces — is what flips the
active path from the legacy resolver to the demand
pipeline, lights up the runtime tunings, and gives the
bench-machine's resolve-phase timer the integrated
form's 2.45-second number to land at.

Three-of-three of the #3084 split. The merge order
across the stack is PR-A #3085 (the trait, no runtime
delta), then PR-B #3086 (the driver, no runtime delta
because the entry-points still bypass it), then this PR
(the entry-point flip and the runtime tunings, the
runtime delta that delivers the perf claim). GitHub's
stacked-PR base-tracking auto-rebases each PR's base
from "its stack predecessor's branch" to `next` as
each predecessor merges. The bench gate fires on each
push to each PR's branch independently — the bench
comments on the three PRs end up appearing in
chronological order of the bench machine's queue pull,
which matches the stack order modulo the GitHub
Actions runner's per-job timing.

Also fixes a latent gap in the shared fetch retry layer that the
cutover's higher concurrency surfaces. `service/fetch.rs`'s
`classify_status` mapped every non-404/429/5xx HTTP status to
`FetchError::Permanent` via a catch-all arm, so a `406 Not
Acceptable` was never retried. npmjs's Cloudflare edge
intermittently answers a manifest request with 406 under heavy
concurrent fan-out — a different package each run, ~one per run, a
transient content-negotiation hiccup rather than a real
unsatisfiable-Accept (which would 406 every request). The legacy
two-phase resolver's gentler 64-wide single-client fetch rarely
tripped it; the demand driver's 256-wide four-client pool fan-out
trips it on nearly every e2e run, failing `utoo install` on a
single stray 406. Reclassify 406 as `Retryable` alongside 429 so
the existing five-delay backoff absorbs it. The buggy catch-all
predates this stack (it lives in the shared layer both paths use),
but the cutover is what makes it user-visible, so the one-line fix
rides with it.

Refs #3028, #3083 (the select-and-state scaffolding
already in next as the stack base), #3084 (the
integrated-form reference, still open at 4833992 on
the `perf/pm-resolver-driver` branch), #3085 (PR-A,
the trait + adapter, draft at sha `c16893ec`), #3086
(PR-B, the driver + four graph helpers, draft at sha
`63a18ad6`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Pkg Manager Area: Package Manager benchmark Run pm-bench on PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant