Skip to content

Semi-lazy imports + auto-mode forkserver preload#1253

Merged
nkemnitz merged 13 commits into
mainfrom
dodam/semi_lazy_import
May 20, 2026
Merged

Semi-lazy imports + auto-mode forkserver preload#1253
nkemnitz merged 13 commits into
mainfrom
dodam/semi_lazy_import

Conversation

@dodamih
Copy link
Copy Markdown
Collaborator

@dodamih dodamih commented May 15, 2026

Summary

Replaces the all-or-nothing forkserver preload with a CUE-driven auto mode that loads only the modules a given spec actually needs. Adds a static @register index, a lookup-miss fallback for late-bound names, dynamic resolvers for np.* / torch.*, and a shared make_lazy_module helper used to lazify ~25 subpackage __init__.py files. Wires the computed preload list to remote workers via env var. Flips the CLI default to auto.

Motivation

zetta run and setup_environment("all") used to import every preload bundle eagerly (~1.4 GB RSS, ~12-15 s on a typical laptop) regardless of what the spec needed. Tri's debug iteration in particular hits this on every CUE typo. After this PR:

Scenario Before After
import zetta_utils (bare) ~1.4 s ~1.2 s (no numpy/torch)
Fast-fail on CUE typo ~12-15 s wasted <2 s
Trivial spec (no torch/mazepa) ~12 s / 1.4 GB ~2-3 s / 105 MB
Production spec (test3.cue: internal alignment + mazepa) ~12 s / 1.4 GB ~5-7 s / 105 MB parent (-89% RSS)
Worker daemon RSS (workers fork from this) ~1.4 GB ~770-830 MB (~600 MB CoW saving per pod)

Everything else stays opt-out: zetta run -l all still works for any case where lazy is undesirable.

Architecture

Phase 0: Static @register index (scan.py)

AST-walk the package looking for @register('name') decorators and register('name')(target) direct calls. Build a name → module index without executing any imports. Disk-cached at ~/.cache/zetta_utils/builder_index.json. Cold scan ~220 ms; warm <50 ms.

Phase 1: Lookup-miss fallback (registry.py)

get_matching_entry consults the index when REGISTRY misses, imports the candidate module(s), retries. Idempotent, thread-safe. Behavior is unchanged when REGISTRY is fully populated; the fallback only fires on lookups that would have raised.

Phase 2: none preload mode

Skip the eager load_*_modules() step in the parent. Daemon still spawns eagerly with preload/none.py (always-eager: parallel, log, builder). Lookup-miss fallback handles everything else on demand.

Phase 3: auto preload mode

Parse the CUE, walk it for @type literals (parsing/spec_scan.py), resolve through the static index to a minimal module set (builder/preload/init.py: compute_preload_set), feed that list directly to multiprocessing.set_forkserver_preload(...) — no preload shim file needed.

Phase 4: Lazy subpackage __init__.py (common/lazy.py)

A package that re-exports many submodules pays a transitive import cost on first import package even when the caller wants one symbol. make_lazy_module(__name__, globals(), subpackages, reexports_by_module) returns __getattr__ + __dir__ that resolve names on first access. Applied to:

  • mazepa, mazepa_layer_processing, layer, tensor_ops, augmentations, db_annotations, message_queues, task_management, training
  • cloud_management, cloud_management/resource_allocation/{__init__, gcloud, k8s}
  • convnet, convnet/architecture, convnet/architecture/deprecated

The helper falls back to importlib.import_module(\".{name}\", pkg) for unknown names, mirroring Python's natural submodule attribute behavior — so pkg.X.Y works whether or not X is explicitly declared.

Dynamic resolvers (built_in_registrations.py)

built_in_registrations.py and convnet/architecture/primitives.py previously enumerated dir(numpy) and dir(torch.nn) etc. at import time, registering ~1400 names + pulling numpy + torch. Now _resolve_numpy(\"np.foo\") and _resolve_torch(\"torch.nn.Bar\") do getattr(numpy, \"foo\") / getattr(torch.nn, \"Bar\") on first lookup. Net: numpy and torch are no longer in the always-eager set; they get loaded only when first needed.

Phase 5: Remote workers (cloud_management/resource_allocation/k8s)

Workers spawned by execute_on_gcp_with_sqs previously used --load_mode try and let each worker load all preloaded subpackages itself. With auto mode, the master computes the minimal preload set from the original CUE and ships it via a ZETTA_PRELOAD_MODULES env var on the worker pod. The CLI reads the env var on startup and passes the explicit list to setup_environment(load_mode='auto', preload_modules=[...]). Master's _compute_worker_preload_modules re-scans ZETTA_RUN_SPEC_PATH to derive the list.

Default flip

--load_mode default is now auto. -l all still available as the safety hatch.

Files changed

New:

  • zetta_utils/builder/scan.py — AST-driven static index
  • zetta_utils/builder/preload/none.py — minimal preload bundle
  • zetta_utils/parsing/spec_scan.pyextract_types for CUE specs
  • zetta_utils/common/lazy.pymake_lazy_module helper
  • tests/unit/builder/test_scan.py (17 tests)
  • tests/unit/builder/test_lazy_fallback.py (11 tests)
  • tests/unit/builder/test_index_parity.py (3 tests)
  • tests/unit/builder/test_none_mode.py (4 tests)
  • tests/unit/builder/test_auto_mode.py (2 tests)
  • tests/unit/builder/test_compute_preload.py (5 tests)
  • tests/unit/builder/test_resolvers.py (15 tests)
  • tests/unit/parsing/test_spec_scan.py (6 tests)
  • tests/unit/common/test_lazy.py (7 tests)

Modified:

  • zetta_utils/__init__.pysetup_environment(load_mode, cue_path, preload_modules); new auto and none modes
  • zetta_utils/builder/{registry, __init__, built_in_registrations}.py — fallback path, dynamic resolvers
  • zetta_utils/builder/preload/__init__.pycompute_preload_set
  • zetta_utils/cli/main.py--load_mode auto default; reads ZETTA_PRELOAD_MODULES env var
  • zetta_utils/cloud_management/resource_allocation/k8s/{pod, deployment}.py — thread preload_modules to worker pod env
  • zetta_utils/mazepa_addons/configurations/execute_on_gcp_with_sqs.py — compute and ship preload list
  • zetta_utils/convnet/architecture/primitives.py — drop torch.* import-time loops
  • 16 __init__.py files lazified via the helper

Behavior changes worth knowing about

  1. setup_environment(\"all\") no longer eagerly cascades into every submodule. The lazified __init__.py files mean from zetta_utils import mazepa only loads the package object, not its submodules. The lookup-miss fallback fills the gap on first use. Existing tests that assumed REGISTRY was "fully populated" after setup_environment(\"all\") may need updating to use a name from the always-eager set (the parity test was updated to use lambda instead of brightness_aug).

  2. CLI default flip. zetta run path.cue defaults to auto mode. -l all is the rollback path.

  3. internal/__init__.py chunkedgraph try/except. Was eager (logged a warning at import time if missing); now lazy (raises ModuleNotFoundError only on first access of build_chunkedgraph_segmentation_flow).

  4. Worker --load_mode try still hardcoded in get_mazepa_worker_command for back-compat. When ZETTA_PRELOAD_MODULES is set by the master, the CLI overrides to auto. Workers without the env var fall back to try (current behavior).

Test plan

  • Unit: 100% line coverage on the 12 files I authored/modified for this work (scan.py, registry.py, built_in_registrations.py, preload/{__init__, all, core, inference, none, training}.py, parsing/spec_scan.py, common/lazy.py, cli/main.py). 2 lines in scan.py carry # pragma: no cover for filesystem-race OSError paths that aren't reproducible in unit tests.
  • Local full suite: 2008/2009 tests pass; the 1 pre-existing failure is a docker-fixture environmental issue.
  • End-to-end smoke:
    • setup_environment(\"none\") then build() of a non-eager spec → succeeds via lookup-miss fallback
    • setup_environment(\"auto\", cue_path=...) → daemon preloads exactly the computed set
    • Subprocess with ZETTA_PRELOAD_MODULES env var → daemon preload list matches the env var
  • CI green
  • Production validation: zetta run an internal CUE in a dev k8s cluster, confirm worker pods get the env var and resource usage drops

Pairs with

internal #198 — must merge after this lands so the lazy __init__.py updates in internal/ align with the helper this PR introduces.

🤖 Generated with Claude Code

@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (f1a2436) to head (615f56e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff             @@
##             main     #1253      +/-   ##
===========================================
+ Coverage   99.98%   100.00%   +0.01%     
===========================================
  Files         206       211       +5     
  Lines       10920     11292     +372     
===========================================
+ Hits        10918     11292     +374     
+ Misses          2         0       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dodamih dodamih force-pushed the dodam/semi_lazy_import branch 2 times, most recently from 1537137 to 1460fe9 Compare May 18, 2026 13:27
@dodamih dodamih requested a review from supersergiy May 18, 2026 15:36
@dodamih dodamih force-pushed the dodam/semi_lazy_import branch 2 times, most recently from 4693c00 to 6bb7317 Compare May 19, 2026 15:23
Copy link
Copy Markdown
Collaborator

@nkemnitz nkemnitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending tests and coverage

@dodamih dodamih force-pushed the dodam/semi_lazy_import branch from 6bb7317 to 2215821 Compare May 20, 2026 01:11
dodamih and others added 8 commits May 19, 2026 19:18
Scans the package with `ast` to discover every @register-decorated and
direct-call register("name")(target) site without executing the modules.
Registry's get_matching_entry now consults this index on miss, importing
the candidate module(s) on demand. Behavior is unchanged when REGISTRY is
already populated; the fallback only activates on lookups that would have
raised. Foundation for upcoming lazy preload modes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
'none' registers only the always-eager set (forkserver patch, log, builder
itself) and skips eager loading in the parent; the lookup-miss fallback
imports modules on demand via the static index. 'auto' requires cue_path,
scans the spec for @type literals, resolves them through the index, and
preloads exactly those modules — typically a handful instead of the full
set. Both fall back to the existing behavior on error or unsupported input.

Also adds parsing.spec_scan.extract_types() for walking parsed CUE specs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds zetta_utils/common/lazy.py — a small helper that takes a package's
name + globals + a tuple of subpackages + a {module: (names,)} re-export
mapping and returns __getattr__ + __dir__. Each lazy package becomes a
~10-line declaration instead of repeating the same import plumbing.

Lazifies these subpackage __init__.py files (each previously eagerly
chained into all of its siblings, defeating the auto preload set's
intent):
  - training, mazepa, mazepa_layer_processing
  - layer, tensor_ops
  - augmentations, db_annotations, message_queues, task_management
  - cloud_management, cloud_management/resource_allocation,
    cloud_management/resource_allocation/{gcloud, k8s}
  - convnet, convnet/architecture, convnet/architecture/deprecated

mazepa_layer_processing/__init__.py keeps its eager builder.register
calls so the "mazepa.Executor" / "mazepa.execute" etc. names land in the
static index at scan time.

Behavior change worth noting: setup_environment("all") used to eagerly
chain into every submodule via the now-lazy __init__.py cascades, so
REGISTRY ended up fully populated at startup. With the lazy treatment
the eager preload covers only the always-eager set + whatever the
preload/{all,inference,training}.py modules name explicitly. The
lookup-miss fallback in registry.get_matching_entry handles the rest on
first use, at the cost of one import on first lookup of a never-touched
name. Updated test_already_registered_name_skips_fallback to use
"lambda" (always-eager) instead of "brightness_aug" which is no longer
preloaded.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds register_dynamic_resolver(prefix, fn) to registry. After the literal
REGISTRY lookup and the static-index fallback both miss, get_matching_entry
consults installed resolvers; on hit the resolver's RegistryEntry is cached
in REGISTRY so subsequent lookups skip the resolver.

Replaces two import-time loops:
  - built_in_registrations.py: enumerated dir(numpy) and registered every
    routine + numeric constant up front (~290 entries, also pulled the
    full numpy import). Now a `_resolve_numpy` resolver does the same
    getattr lookup on demand.
  - convnet/architecture/primitives.py: enumerated dir(torch.nn) /
    dir(torch.optim) / dir(torch.optim.lr_scheduler) / etc. and
    registered every public class or builtin method (~1100 entries, also
    pulled torch). Now a `_resolve_torch` resolver walks the dotted path
    inside torch on demand.

Net effect: numpy and torch are no longer imported by `import zetta_utils`
or by `setup_environment("none")`. They get pulled the first time any
module that uses them is loaded, or the first time a `np.*` / `torch.*`
name is looked up.

Explicit decorations like `torch.nn.Sequential` (defined in primitives.py)
still win because the static-index fallback runs before the resolver.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nv var

Workers spawned by execute_on_gcp_with_sqs previously used --load_mode try
and let each worker load all preloaded subpackages itself. With auto mode
the master knows the minimal preload set for the spec, but had no way to
hand it to remote workers — workers don't see the original CUE.

Wires it via an env var on the worker pod:

  - setup_environment gains a preload_modules: list[str] | None kwarg.
    When set with load_mode='auto', skips the CUE scan and uses the list
    directly.
  - CLI reads ZETTA_PRELOAD_MODULES, splits on comma, forces auto mode
    when present (overriding --load_mode). Backward-compatible: if unset,
    behavior is unchanged.
  - get_mazepa_pod_spec accepts preload_modules and appends a V1EnvVar
    ZETTA_PRELOAD_MODULES=<comma-list> to the worker container env.
  - get_mazepa_worker_deployment and _get_mazepa_deployment thread the
    list down to get_mazepa_pod_spec.
  - get_gcp_with_sqs_config calls a new _compute_worker_preload_modules
    helper that re-scans ZETTA_RUN_SPEC_PATH (the master's recorded spec
    file) and computes the preload set, then passes it to every worker
    group. Failure to compute returns None silently — workers fall back
    to whatever --load_mode the CLI was started with.

End-to-end verified: a subprocess with ZETTA_PRELOAD_MODULES set spawns
the daemon with exactly that list, and lookups for names not in the list
still resolve via the static-index fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flips the default preload mode from 'all' to 'auto'. zetta run on a CUE
file now scans the spec for @type literals and preloads only the modules
needed, instead of importing every preload bundle.

Side change: materialize --str_spec to its tempfile path BEFORE
setup_environment runs, so auto mode can scan str_spec invocations too.
The previous flow wrote the tempfile after setup, which meant str_spec
auto would fall back to 'all' for lack of a cue_path. Tempfile path now
goes through the same auto path as -p PATH.

For users that want the old behavior: pass -l all explicitly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds tests covering every line of the new and changed files in the
semi-lazy import work:

  - zetta_utils/builder/scan.py
  - zetta_utils/builder/registry.py
  - zetta_utils/builder/built_in_registrations.py
  - zetta_utils/builder/preload/__init__.py
  - zetta_utils/parsing/spec_scan.py
  - zetta_utils/common/lazy.py
  - zetta_utils/cli/main.py

New test files: test_resolvers.py (np.* / torch.* dynamic resolver
paths), tests/unit/common/test_lazy.py (make_lazy_module __getattr__ +
__dir__ + fallback submodule import + AttributeError path). Extended
test_lazy_fallback.py (failed-import logging, dynamic-resolver exception
swallowing, multiple-matches error, register-duplicate-version error,
unregister, under-lock recheck). Extended test_scan.py (syntax error
files, nonexistent root, large-file path, unreadable file, corrupt cache
JSON, wrong-shape cache, multi-root file resolution, cache write OSError,
register call with **kwargs).

Fixed test_index_parity.py to also skip lambdas — they're never statically
discoverable, so a CLI extra-import test that registers a lambda inside
zetta_utils.cli.main no longer trips parity.

Two filesystem-race paths in scan.py (_candidate_files OSError after
candidate detection; _cache_key stat OSError) carry `# pragma: no cover` —
they guard against files disappearing mid-walk and aren't reliably
reproducible in a unit test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dodamih dodamih force-pushed the dodam/semi_lazy_import branch from 2215821 to 40616d3 Compare May 20, 2026 02:18
dodamih and others added 2 commits May 19, 2026 20:58
psutil.virtual_memory() reads /proc/meminfo which inside a k8s pod
reports the NODE's memory, not the pod's cgroup limit. A pod requesting
13 GiB on a 30 GiB node could be at ~96% of its limit while
get_memory_usage() reported ~40%. Misleading for any consumer of
RESOURCE_STATS_FILE that cares about pod-relative pressure.

Adds _read_cgroup_memory() which reads memory.current / memory.max
(cgroup v2) or memory.usage_in_bytes / memory.limit_in_bytes (cgroup
v1). get_memory_usage() prefers the cgroup view and falls back to
psutil when:
  - the cgroup files are absent (running outside a container)
  - the limit is unset / sentinel (no memory constraint configured)
  - the limit field contains "max" (cgroup v2 string form)

The OOM tracker sidecar already does the right thing via the k8s API;
this brings the in-container reporter in line with it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Points to the rebased internal branch carrying the matching
lazy-resolve __init__ changes. Will revert / re-bump to internal main
after the internal PR merges.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@dodamih dodamih force-pushed the dodam/semi_lazy_import branch from 40616d3 to 4f9c867 Compare May 20, 2026 03:58
@nkemnitz nkemnitz merged commit 9db4f2f into main May 20, 2026
11 checks passed
@nkemnitz nkemnitz deleted the dodam/semi_lazy_import branch May 20, 2026 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants