Semi-lazy imports + auto-mode forkserver preload by dodamih · Pull Request #1253 · ZettaAI/zetta_utils

dodamih · 2026-05-15T13:29:18Z

Summary

Replaces the all-or-nothing forkserver preload with a CUE-driven auto mode that loads only the modules a given spec actually needs. Adds a static @register index, a lookup-miss fallback for late-bound names, dynamic resolvers for np.* / torch.*, and a shared make_lazy_module helper used to lazify ~25 subpackage __init__.py files. Wires the computed preload list to remote workers via env var. Flips the CLI default to auto.

Motivation

zetta run and setup_environment("all") used to import every preload bundle eagerly (~1.4 GB RSS, ~12-15 s on a typical laptop) regardless of what the spec needed. Tri's debug iteration in particular hits this on every CUE typo. After this PR:

Scenario	Before	After
`import zetta_utils` (bare)	~1.4 s	~1.2 s (no numpy/torch)
Fast-fail on CUE typo	~12-15 s wasted	<2 s
Trivial spec (no torch/mazepa)	~12 s / 1.4 GB	~2-3 s / 105 MB
Production spec (test3.cue: internal alignment + mazepa)	~12 s / 1.4 GB	~5-7 s / 105 MB parent (-89% RSS)
Worker daemon RSS (workers fork from this)	~1.4 GB	~770-830 MB (~600 MB CoW saving per pod)

Everything else stays opt-out: zetta run -l all still works for any case where lazy is undesirable.

Architecture

Phase 0: Static `@register` index (scan.py)

AST-walk the package looking for @register('name') decorators and register('name')(target) direct calls. Build a name → module index without executing any imports. Disk-cached at ~/.cache/zetta_utils/builder_index.json. Cold scan ~220 ms; warm <50 ms.

Phase 1: Lookup-miss fallback (registry.py)

get_matching_entry consults the index when REGISTRY misses, imports the candidate module(s), retries. Idempotent, thread-safe. Behavior is unchanged when REGISTRY is fully populated; the fallback only fires on lookups that would have raised.

Phase 2: `none` preload mode

Skip the eager load_*_modules() step in the parent. Daemon still spawns eagerly with preload/none.py (always-eager: parallel, log, builder). Lookup-miss fallback handles everything else on demand.

Phase 3: `auto` preload mode

Parse the CUE, walk it for @type literals (parsing/spec_scan.py), resolve through the static index to a minimal module set (builder/preload/init.py: compute_preload_set), feed that list directly to multiprocessing.set_forkserver_preload(...) — no preload shim file needed.

Phase 4: Lazy subpackage `init.py` (common/lazy.py)

A package that re-exports many submodules pays a transitive import cost on first import package even when the caller wants one symbol. make_lazy_module(__name__, globals(), subpackages, reexports_by_module) returns __getattr__ + __dir__ that resolve names on first access. Applied to:

mazepa, mazepa_layer_processing, layer, tensor_ops, augmentations, db_annotations, message_queues, task_management, training
cloud_management, cloud_management/resource_allocation/{__init__, gcloud, k8s}
convnet, convnet/architecture, convnet/architecture/deprecated

The helper falls back to importlib.import_module(\".{name}\", pkg) for unknown names, mirroring Python's natural submodule attribute behavior — so pkg.X.Y works whether or not X is explicitly declared.

Dynamic resolvers (built_in_registrations.py)

built_in_registrations.py and convnet/architecture/primitives.py previously enumerated dir(numpy) and dir(torch.nn) etc. at import time, registering ~1400 names + pulling numpy + torch. Now _resolve_numpy(\"np.foo\") and _resolve_torch(\"torch.nn.Bar\") do getattr(numpy, \"foo\") / getattr(torch.nn, \"Bar\") on first lookup. Net: numpy and torch are no longer in the always-eager set; they get loaded only when first needed.

Phase 5: Remote workers (cloud_management/resource_allocation/k8s)

Workers spawned by execute_on_gcp_with_sqs previously used --load_mode try and let each worker load all preloaded subpackages itself. With auto mode, the master computes the minimal preload set from the original CUE and ships it via a ZETTA_PRELOAD_MODULES env var on the worker pod. The CLI reads the env var on startup and passes the explicit list to setup_environment(load_mode='auto', preload_modules=[...]). Master's _compute_worker_preload_modules re-scans ZETTA_RUN_SPEC_PATH to derive the list.

Default flip

--load_mode default is now auto. -l all still available as the safety hatch.

Files changed

New:

zetta_utils/builder/scan.py — AST-driven static index
zetta_utils/builder/preload/none.py — minimal preload bundle
zetta_utils/parsing/spec_scan.py — extract_types for CUE specs
zetta_utils/common/lazy.py — make_lazy_module helper
tests/unit/builder/test_scan.py (17 tests)
tests/unit/builder/test_lazy_fallback.py (11 tests)
tests/unit/builder/test_index_parity.py (3 tests)
tests/unit/builder/test_none_mode.py (4 tests)
tests/unit/builder/test_auto_mode.py (2 tests)
tests/unit/builder/test_compute_preload.py (5 tests)
tests/unit/builder/test_resolvers.py (15 tests)
tests/unit/parsing/test_spec_scan.py (6 tests)
tests/unit/common/test_lazy.py (7 tests)

Modified:

zetta_utils/__init__.py — setup_environment(load_mode, cue_path, preload_modules); new auto and none modes
zetta_utils/builder/{registry, __init__, built_in_registrations}.py — fallback path, dynamic resolvers
zetta_utils/builder/preload/__init__.py — compute_preload_set
zetta_utils/cli/main.py — --load_mode auto default; reads ZETTA_PRELOAD_MODULES env var
zetta_utils/cloud_management/resource_allocation/k8s/{pod, deployment}.py — thread preload_modules to worker pod env
zetta_utils/mazepa_addons/configurations/execute_on_gcp_with_sqs.py — compute and ship preload list
zetta_utils/convnet/architecture/primitives.py — drop torch.* import-time loops
16 __init__.py files lazified via the helper

Behavior changes worth knowing about

setup_environment(\"all\") no longer eagerly cascades into every submodule. The lazified __init__.py files mean from zetta_utils import mazepa only loads the package object, not its submodules. The lookup-miss fallback fills the gap on first use. Existing tests that assumed REGISTRY was "fully populated" after setup_environment(\"all\") may need updating to use a name from the always-eager set (the parity test was updated to use lambda instead of brightness_aug).
CLI default flip. zetta run path.cue defaults to auto mode. -l all is the rollback path.
internal/__init__.py chunkedgraph try/except. Was eager (logged a warning at import time if missing); now lazy (raises ModuleNotFoundError only on first access of build_chunkedgraph_segmentation_flow).
Worker --load_mode try still hardcoded in get_mazepa_worker_command for back-compat. When ZETTA_PRELOAD_MODULES is set by the master, the CLI overrides to auto. Workers without the env var fall back to try (current behavior).

Test plan

Unit: 100% line coverage on the 12 files I authored/modified for this work (scan.py, registry.py, built_in_registrations.py, preload/{__init__, all, core, inference, none, training}.py, parsing/spec_scan.py, common/lazy.py, cli/main.py). 2 lines in scan.py carry # pragma: no cover for filesystem-race OSError paths that aren't reproducible in unit tests.
Local full suite: 2008/2009 tests pass; the 1 pre-existing failure is a docker-fixture environmental issue.
End-to-end smoke:
- setup_environment(\"none\") then build() of a non-eager spec → succeeds via lookup-miss fallback
- setup_environment(\"auto\", cue_path=...) → daemon preloads exactly the computed set
- Subprocess with ZETTA_PRELOAD_MODULES env var → daemon preload list matches the env var
CI green
Production validation: zetta run an internal CUE in a dev k8s cluster, confirm worker pods get the env var and resource usage drops

Pairs with

internal #198 — must merge after this lands so the lazy __init__.py updates in internal/ align with the helper this PR introduces.

🤖 Generated with Claude Code

codecov · 2026-05-15T13:42:18Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (f1a2436) to head (615f56e).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #1253      +/-   ##
===========================================
+ Coverage   99.98%   100.00%   +0.01%     
===========================================
  Files         206       211       +5     
  Lines       10920     11292     +372     
===========================================
+ Hits        10918     11292     +374     
+ Misses          2         0       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

nkemnitz

Pending tests and coverage

Scans the package with `ast` to discover every @register-decorated and direct-call register("name")(target) site without executing the modules. Registry's get_matching_entry now consults this index on miss, importing the candidate module(s) on demand. Behavior is unchanged when REGISTRY is already populated; the fallback only activates on lookups that would have raised. Foundation for upcoming lazy preload modes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@type

'none' registers only the always-eager set (forkserver patch, log, builder itself) and skips eager loading in the parent; the lookup-miss fallback imports modules on demand via the static index. 'auto' requires cue_path, scans the spec for @type literals, resolves them through the index, and preloads exactly those modules — typically a handful instead of the full set. Both fall back to the existing behavior on error or unsupported input. Also adds parsing.spec_scan.extract_types() for walking parsed CUE specs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds zetta_utils/common/lazy.py — a small helper that takes a package's name + globals + a tuple of subpackages + a {module: (names,)} re-export mapping and returns __getattr__ + __dir__. Each lazy package becomes a ~10-line declaration instead of repeating the same import plumbing. Lazifies these subpackage __init__.py files (each previously eagerly chained into all of its siblings, defeating the auto preload set's intent): - training, mazepa, mazepa_layer_processing - layer, tensor_ops - augmentations, db_annotations, message_queues, task_management - cloud_management, cloud_management/resource_allocation, cloud_management/resource_allocation/{gcloud, k8s} - convnet, convnet/architecture, convnet/architecture/deprecated mazepa_layer_processing/__init__.py keeps its eager builder.register calls so the "mazepa.Executor" / "mazepa.execute" etc. names land in the static index at scan time. Behavior change worth noting: setup_environment("all") used to eagerly chain into every submodule via the now-lazy __init__.py cascades, so REGISTRY ended up fully populated at startup. With the lazy treatment the eager preload covers only the always-eager set + whatever the preload/{all,inference,training}.py modules name explicitly. The lookup-miss fallback in registry.get_matching_entry handles the rest on first use, at the cost of one import on first lookup of a never-touched name. Updated test_already_registered_name_skips_fallback to use "lambda" (always-eager) instead of "brightness_aug" which is no longer preloaded. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds register_dynamic_resolver(prefix, fn) to registry. After the literal REGISTRY lookup and the static-index fallback both miss, get_matching_entry consults installed resolvers; on hit the resolver's RegistryEntry is cached in REGISTRY so subsequent lookups skip the resolver. Replaces two import-time loops: - built_in_registrations.py: enumerated dir(numpy) and registered every routine + numeric constant up front (~290 entries, also pulled the full numpy import). Now a `_resolve_numpy` resolver does the same getattr lookup on demand. - convnet/architecture/primitives.py: enumerated dir(torch.nn) / dir(torch.optim) / dir(torch.optim.lr_scheduler) / etc. and registered every public class or builtin method (~1100 entries, also pulled torch). Now a `_resolve_torch` resolver walks the dotted path inside torch on demand. Net effect: numpy and torch are no longer imported by `import zetta_utils` or by `setup_environment("none")`. They get pulled the first time any module that uses them is loaded, or the first time a `np.*` / `torch.*` name is looked up. Explicit decorations like `torch.nn.Sequential` (defined in primitives.py) still win because the static-index fallback runs before the resolver. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…nv var Workers spawned by execute_on_gcp_with_sqs previously used --load_mode try and let each worker load all preloaded subpackages itself. With auto mode the master knows the minimal preload set for the spec, but had no way to hand it to remote workers — workers don't see the original CUE. Wires it via an env var on the worker pod: - setup_environment gains a preload_modules: list[str] | None kwarg. When set with load_mode='auto', skips the CUE scan and uses the list directly. - CLI reads ZETTA_PRELOAD_MODULES, splits on comma, forces auto mode when present (overriding --load_mode). Backward-compatible: if unset, behavior is unchanged. - get_mazepa_pod_spec accepts preload_modules and appends a V1EnvVar ZETTA_PRELOAD_MODULES=<comma-list> to the worker container env. - get_mazepa_worker_deployment and _get_mazepa_deployment thread the list down to get_mazepa_pod_spec. - get_gcp_with_sqs_config calls a new _compute_worker_preload_modules helper that re-scans ZETTA_RUN_SPEC_PATH (the master's recorded spec file) and computes the preload set, then passes it to every worker group. Failure to compute returns None silently — workers fall back to whatever --load_mode the CLI was started with. End-to-end verified: a subprocess with ZETTA_PRELOAD_MODULES set spawns the daemon with exactly that list, and lookups for names not in the list still resolve via the static-index fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@type

Flips the default preload mode from 'all' to 'auto'. zetta run on a CUE file now scans the spec for @type literals and preloads only the modules needed, instead of importing every preload bundle. Side change: materialize --str_spec to its tempfile path BEFORE setup_environment runs, so auto mode can scan str_spec invocations too. The previous flow wrote the tempfile after setup, which meant str_spec auto would fall back to 'all' for lack of a cue_path. Tempfile path now goes through the same auto path as -p PATH. For users that want the old behavior: pass -l all explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds tests covering every line of the new and changed files in the semi-lazy import work: - zetta_utils/builder/scan.py - zetta_utils/builder/registry.py - zetta_utils/builder/built_in_registrations.py - zetta_utils/builder/preload/__init__.py - zetta_utils/parsing/spec_scan.py - zetta_utils/common/lazy.py - zetta_utils/cli/main.py New test files: test_resolvers.py (np.* / torch.* dynamic resolver paths), tests/unit/common/test_lazy.py (make_lazy_module __getattr__ + __dir__ + fallback submodule import + AttributeError path). Extended test_lazy_fallback.py (failed-import logging, dynamic-resolver exception swallowing, multiple-matches error, register-duplicate-version error, unregister, under-lock recheck). Extended test_scan.py (syntax error files, nonexistent root, large-file path, unreadable file, corrupt cache JSON, wrong-shape cache, multi-root file resolution, cache write OSError, register call with **kwargs). Fixed test_index_parity.py to also skip lambdas — they're never statically discoverable, so a CLI extra-import test that registers a lambda inside zetta_utils.cli.main no longer trips parity. Two filesystem-race paths in scan.py (_candidate_files OSError after candidate detection; _cache_key stat OSError) carry `# pragma: no cover` — they guard against files disappearing mid-walk and aren't reliably reproducible in a unit test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

psutil.virtual_memory() reads /proc/meminfo which inside a k8s pod reports the NODE's memory, not the pod's cgroup limit. A pod requesting 13 GiB on a 30 GiB node could be at ~96% of its limit while get_memory_usage() reported ~40%. Misleading for any consumer of RESOURCE_STATS_FILE that cares about pod-relative pressure. Adds _read_cgroup_memory() which reads memory.current / memory.max (cgroup v2) or memory.usage_in_bytes / memory.limit_in_bytes (cgroup v1). get_memory_usage() prefers the cgroup view and falls back to psutil when: - the cgroup files are absent (running outside a container) - the limit is unset / sentinel (no memory constraint configured) - the limit field contains "max" (cgroup v2 string form) The OOM tracker sidecar already does the right thing via the k8s API; this brings the in-container reporter in line with it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Points to the rebased internal branch carrying the matching lazy-resolve __init__ changes. Will revert / re-bump to internal main after the internal PR merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dodamih force-pushed the dodam/semi_lazy_import branch 2 times, most recently from 1537137 to 1460fe9 Compare May 18, 2026 13:27

dodamih requested a review from supersergiy May 18, 2026 15:36

dodamih force-pushed the dodam/semi_lazy_import branch 2 times, most recently from 4693c00 to 6bb7317 Compare May 19, 2026 15:23

nkemnitz approved these changes May 19, 2026

View reviewed changes

dodamih force-pushed the dodam/semi_lazy_import branch from 6bb7317 to 2215821 Compare May 20, 2026 01:11

dodamih and others added 8 commits May 19, 2026 19:18

feat: final dst in subchunkable never has cache enabled

05ccf6c

dodamih force-pushed the dodam/semi_lazy_import branch from 2215821 to 40616d3 Compare May 20, 2026 02:18

dodamih and others added 2 commits May 19, 2026 20:58

chore: bump internal submodule pointer for review

4f9c867

Points to the rebased internal branch carrying the matching lazy-resolve __init__ changes. Will revert / re-bump to internal main after the internal PR merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dodamih force-pushed the dodam/semi_lazy_import branch from 40616d3 to 4f9c867 Compare May 20, 2026 03:58

nkemnitz added 3 commits May 20, 2026 07:01

tests(task_management): fix missing coverage

5dba3d0

tests(lazy_import): force import all submodules

56d71a4

chore: update submodules

615f56e

nkemnitz merged commit 9db4f2f into main May 20, 2026
11 checks passed

nkemnitz deleted the dodam/semi_lazy_import branch May 20, 2026 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Semi-lazy imports + auto-mode forkserver preload#1253

Semi-lazy imports + auto-mode forkserver preload#1253
nkemnitz merged 13 commits into
mainfrom
dodam/semi_lazy_import

dodamih commented May 15, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 15, 2026 •

edited

Loading

Uh oh!

nkemnitz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dodamih commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Architecture

Phase 0: Static @register index (scan.py)

Phase 1: Lookup-miss fallback (registry.py)

Phase 2: none preload mode

Phase 3: auto preload mode

Phase 4: Lazy subpackage __init__.py (common/lazy.py)

Dynamic resolvers (built_in_registrations.py)

Phase 5: Remote workers (cloud_management/resource_allocation/k8s)

Default flip

Files changed

Behavior changes worth knowing about

Test plan

Pairs with

Uh oh!

codecov Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nkemnitz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dodamih commented May 15, 2026 •

edited

Loading

Phase 0: Static `@register` index (scan.py)

Phase 2: `none` preload mode

Phase 3: `auto` preload mode

Phase 4: Lazy subpackage `init.py` (common/lazy.py)

codecov Bot commented May 15, 2026 •

edited

Loading