Semi-lazy imports + auto-mode forkserver preload#1253
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1253 +/- ##
===========================================
+ Coverage 99.98% 100.00% +0.01%
===========================================
Files 206 211 +5
Lines 10920 11292 +372
===========================================
+ Hits 10918 11292 +374
+ Misses 2 0 -2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
1537137 to
1460fe9
Compare
4693c00 to
6bb7317
Compare
nkemnitz
approved these changes
May 19, 2026
Collaborator
nkemnitz
left a comment
There was a problem hiding this comment.
Pending tests and coverage
6bb7317 to
2215821
Compare
Scans the package with `ast` to discover every @register-decorated and
direct-call register("name")(target) site without executing the modules.
Registry's get_matching_entry now consults this index on miss, importing
the candidate module(s) on demand. Behavior is unchanged when REGISTRY is
already populated; the fallback only activates on lookups that would have
raised. Foundation for upcoming lazy preload modes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
'none' registers only the always-eager set (forkserver patch, log, builder itself) and skips eager loading in the parent; the lookup-miss fallback imports modules on demand via the static index. 'auto' requires cue_path, scans the spec for @type literals, resolves them through the index, and preloads exactly those modules — typically a handful instead of the full set. Both fall back to the existing behavior on error or unsupported input. Also adds parsing.spec_scan.extract_types() for walking parsed CUE specs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds zetta_utils/common/lazy.py — a small helper that takes a package's
name + globals + a tuple of subpackages + a {module: (names,)} re-export
mapping and returns __getattr__ + __dir__. Each lazy package becomes a
~10-line declaration instead of repeating the same import plumbing.
Lazifies these subpackage __init__.py files (each previously eagerly
chained into all of its siblings, defeating the auto preload set's
intent):
- training, mazepa, mazepa_layer_processing
- layer, tensor_ops
- augmentations, db_annotations, message_queues, task_management
- cloud_management, cloud_management/resource_allocation,
cloud_management/resource_allocation/{gcloud, k8s}
- convnet, convnet/architecture, convnet/architecture/deprecated
mazepa_layer_processing/__init__.py keeps its eager builder.register
calls so the "mazepa.Executor" / "mazepa.execute" etc. names land in the
static index at scan time.
Behavior change worth noting: setup_environment("all") used to eagerly
chain into every submodule via the now-lazy __init__.py cascades, so
REGISTRY ended up fully populated at startup. With the lazy treatment
the eager preload covers only the always-eager set + whatever the
preload/{all,inference,training}.py modules name explicitly. The
lookup-miss fallback in registry.get_matching_entry handles the rest on
first use, at the cost of one import on first lookup of a never-touched
name. Updated test_already_registered_name_skips_fallback to use
"lambda" (always-eager) instead of "brightness_aug" which is no longer
preloaded.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds register_dynamic_resolver(prefix, fn) to registry. After the literal
REGISTRY lookup and the static-index fallback both miss, get_matching_entry
consults installed resolvers; on hit the resolver's RegistryEntry is cached
in REGISTRY so subsequent lookups skip the resolver.
Replaces two import-time loops:
- built_in_registrations.py: enumerated dir(numpy) and registered every
routine + numeric constant up front (~290 entries, also pulled the
full numpy import). Now a `_resolve_numpy` resolver does the same
getattr lookup on demand.
- convnet/architecture/primitives.py: enumerated dir(torch.nn) /
dir(torch.optim) / dir(torch.optim.lr_scheduler) / etc. and
registered every public class or builtin method (~1100 entries, also
pulled torch). Now a `_resolve_torch` resolver walks the dotted path
inside torch on demand.
Net effect: numpy and torch are no longer imported by `import zetta_utils`
or by `setup_environment("none")`. They get pulled the first time any
module that uses them is loaded, or the first time a `np.*` / `torch.*`
name is looked up.
Explicit decorations like `torch.nn.Sequential` (defined in primitives.py)
still win because the static-index fallback runs before the resolver.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nv var
Workers spawned by execute_on_gcp_with_sqs previously used --load_mode try
and let each worker load all preloaded subpackages itself. With auto mode
the master knows the minimal preload set for the spec, but had no way to
hand it to remote workers — workers don't see the original CUE.
Wires it via an env var on the worker pod:
- setup_environment gains a preload_modules: list[str] | None kwarg.
When set with load_mode='auto', skips the CUE scan and uses the list
directly.
- CLI reads ZETTA_PRELOAD_MODULES, splits on comma, forces auto mode
when present (overriding --load_mode). Backward-compatible: if unset,
behavior is unchanged.
- get_mazepa_pod_spec accepts preload_modules and appends a V1EnvVar
ZETTA_PRELOAD_MODULES=<comma-list> to the worker container env.
- get_mazepa_worker_deployment and _get_mazepa_deployment thread the
list down to get_mazepa_pod_spec.
- get_gcp_with_sqs_config calls a new _compute_worker_preload_modules
helper that re-scans ZETTA_RUN_SPEC_PATH (the master's recorded spec
file) and computes the preload set, then passes it to every worker
group. Failure to compute returns None silently — workers fall back
to whatever --load_mode the CLI was started with.
End-to-end verified: a subprocess with ZETTA_PRELOAD_MODULES set spawns
the daemon with exactly that list, and lookups for names not in the list
still resolve via the static-index fallback.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Flips the default preload mode from 'all' to 'auto'. zetta run on a CUE file now scans the spec for @type literals and preloads only the modules needed, instead of importing every preload bundle. Side change: materialize --str_spec to its tempfile path BEFORE setup_environment runs, so auto mode can scan str_spec invocations too. The previous flow wrote the tempfile after setup, which meant str_spec auto would fall back to 'all' for lack of a cue_path. Tempfile path now goes through the same auto path as -p PATH. For users that want the old behavior: pass -l all explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds tests covering every line of the new and changed files in the semi-lazy import work: - zetta_utils/builder/scan.py - zetta_utils/builder/registry.py - zetta_utils/builder/built_in_registrations.py - zetta_utils/builder/preload/__init__.py - zetta_utils/parsing/spec_scan.py - zetta_utils/common/lazy.py - zetta_utils/cli/main.py New test files: test_resolvers.py (np.* / torch.* dynamic resolver paths), tests/unit/common/test_lazy.py (make_lazy_module __getattr__ + __dir__ + fallback submodule import + AttributeError path). Extended test_lazy_fallback.py (failed-import logging, dynamic-resolver exception swallowing, multiple-matches error, register-duplicate-version error, unregister, under-lock recheck). Extended test_scan.py (syntax error files, nonexistent root, large-file path, unreadable file, corrupt cache JSON, wrong-shape cache, multi-root file resolution, cache write OSError, register call with **kwargs). Fixed test_index_parity.py to also skip lambdas — they're never statically discoverable, so a CLI extra-import test that registers a lambda inside zetta_utils.cli.main no longer trips parity. Two filesystem-race paths in scan.py (_candidate_files OSError after candidate detection; _cache_key stat OSError) carry `# pragma: no cover` — they guard against files disappearing mid-walk and aren't reliably reproducible in a unit test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2215821 to
40616d3
Compare
psutil.virtual_memory() reads /proc/meminfo which inside a k8s pod reports the NODE's memory, not the pod's cgroup limit. A pod requesting 13 GiB on a 30 GiB node could be at ~96% of its limit while get_memory_usage() reported ~40%. Misleading for any consumer of RESOURCE_STATS_FILE that cares about pod-relative pressure. Adds _read_cgroup_memory() which reads memory.current / memory.max (cgroup v2) or memory.usage_in_bytes / memory.limit_in_bytes (cgroup v1). get_memory_usage() prefers the cgroup view and falls back to psutil when: - the cgroup files are absent (running outside a container) - the limit is unset / sentinel (no memory constraint configured) - the limit field contains "max" (cgroup v2 string form) The OOM tracker sidecar already does the right thing via the k8s API; this brings the in-container reporter in line with it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Points to the rebased internal branch carrying the matching lazy-resolve __init__ changes. Will revert / re-bump to internal main after the internal PR merges. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
40616d3 to
4f9c867
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the all-or-nothing forkserver preload with a CUE-driven
automode that loads only the modules a given spec actually needs. Adds a static@registerindex, a lookup-miss fallback for late-bound names, dynamic resolvers fornp.*/torch.*, and a sharedmake_lazy_modulehelper used to lazify ~25 subpackage__init__.pyfiles. Wires the computed preload list to remote workers via env var. Flips the CLI default toauto.Motivation
zetta runandsetup_environment("all")used to import every preload bundle eagerly (~1.4 GB RSS, ~12-15 s on a typical laptop) regardless of what the spec needed. Tri's debug iteration in particular hits this on every CUE typo. After this PR:import zetta_utils(bare)Everything else stays opt-out:
zetta run -l allstill works for any case where lazy is undesirable.Architecture
Phase 0: Static
@registerindex (scan.py)AST-walk the package looking for
@register('name')decorators andregister('name')(target)direct calls. Build aname → moduleindex without executing any imports. Disk-cached at~/.cache/zetta_utils/builder_index.json. Cold scan ~220 ms; warm <50 ms.Phase 1: Lookup-miss fallback (registry.py)
get_matching_entryconsults the index when REGISTRY misses, imports the candidate module(s), retries. Idempotent, thread-safe. Behavior is unchanged when REGISTRY is fully populated; the fallback only fires on lookups that would have raised.Phase 2:
nonepreload modeSkip the eager
load_*_modules()step in the parent. Daemon still spawns eagerly withpreload/none.py(always-eager:parallel,log,builder). Lookup-miss fallback handles everything else on demand.Phase 3:
autopreload modeParse the CUE, walk it for
@typeliterals (parsing/spec_scan.py), resolve through the static index to a minimal module set (builder/preload/init.py:compute_preload_set), feed that list directly tomultiprocessing.set_forkserver_preload(...)— no preload shim file needed.Phase 4: Lazy subpackage
__init__.py(common/lazy.py)A package that re-exports many submodules pays a transitive import cost on first
import packageeven when the caller wants one symbol.make_lazy_module(__name__, globals(), subpackages, reexports_by_module)returns__getattr__+__dir__that resolve names on first access. Applied to:mazepa, mazepa_layer_processing, layer, tensor_ops, augmentations, db_annotations, message_queues, task_management, trainingcloud_management, cloud_management/resource_allocation/{__init__, gcloud, k8s}convnet, convnet/architecture, convnet/architecture/deprecatedThe helper falls back to
importlib.import_module(\".{name}\", pkg)for unknown names, mirroring Python's natural submodule attribute behavior — sopkg.X.Yworks whether or notXis explicitly declared.Dynamic resolvers (built_in_registrations.py)
built_in_registrations.pyandconvnet/architecture/primitives.pypreviously enumerateddir(numpy)anddir(torch.nn)etc. at import time, registering ~1400 names + pulling numpy + torch. Now_resolve_numpy(\"np.foo\")and_resolve_torch(\"torch.nn.Bar\")dogetattr(numpy, \"foo\")/getattr(torch.nn, \"Bar\")on first lookup. Net: numpy and torch are no longer in the always-eager set; they get loaded only when first needed.Phase 5: Remote workers (cloud_management/resource_allocation/k8s)
Workers spawned by
execute_on_gcp_with_sqspreviously used--load_mode tryand let each worker load all preloaded subpackages itself. With auto mode, the master computes the minimal preload set from the original CUE and ships it via aZETTA_PRELOAD_MODULESenv var on the worker pod. The CLI reads the env var on startup and passes the explicit list tosetup_environment(load_mode='auto', preload_modules=[...]). Master's_compute_worker_preload_modulesre-scansZETTA_RUN_SPEC_PATHto derive the list.Default flip
--load_modedefault is nowauto.-l allstill available as the safety hatch.Files changed
New:
zetta_utils/builder/scan.py— AST-driven static indexzetta_utils/builder/preload/none.py— minimal preload bundlezetta_utils/parsing/spec_scan.py—extract_typesfor CUE specszetta_utils/common/lazy.py—make_lazy_modulehelpertests/unit/builder/test_scan.py(17 tests)tests/unit/builder/test_lazy_fallback.py(11 tests)tests/unit/builder/test_index_parity.py(3 tests)tests/unit/builder/test_none_mode.py(4 tests)tests/unit/builder/test_auto_mode.py(2 tests)tests/unit/builder/test_compute_preload.py(5 tests)tests/unit/builder/test_resolvers.py(15 tests)tests/unit/parsing/test_spec_scan.py(6 tests)tests/unit/common/test_lazy.py(7 tests)Modified:
zetta_utils/__init__.py—setup_environment(load_mode, cue_path, preload_modules); newautoandnonemodeszetta_utils/builder/{registry, __init__, built_in_registrations}.py— fallback path, dynamic resolverszetta_utils/builder/preload/__init__.py—compute_preload_setzetta_utils/cli/main.py—--load_mode autodefault; readsZETTA_PRELOAD_MODULESenv varzetta_utils/cloud_management/resource_allocation/k8s/{pod, deployment}.py— threadpreload_modulesto worker pod envzetta_utils/mazepa_addons/configurations/execute_on_gcp_with_sqs.py— compute and ship preload listzetta_utils/convnet/architecture/primitives.py— droptorch.*import-time loops__init__.pyfiles lazified via the helperBehavior changes worth knowing about
setup_environment(\"all\")no longer eagerly cascades into every submodule. The lazified__init__.pyfiles meanfrom zetta_utils import mazepaonly loads the package object, not its submodules. The lookup-miss fallback fills the gap on first use. Existing tests that assumed REGISTRY was "fully populated" aftersetup_environment(\"all\")may need updating to use a name from the always-eager set (the parity test was updated to uselambdainstead ofbrightness_aug).CLI default flip.
zetta run path.cuedefaults toautomode.-l allis the rollback path.internal/__init__.pychunkedgraph try/except. Was eager (logged a warning at import time if missing); now lazy (raisesModuleNotFoundErroronly on first access ofbuild_chunkedgraph_segmentation_flow).Worker
--load_mode trystill hardcoded inget_mazepa_worker_commandfor back-compat. WhenZETTA_PRELOAD_MODULESis set by the master, the CLI overrides toauto. Workers without the env var fall back totry(current behavior).Test plan
scan.py,registry.py,built_in_registrations.py,preload/{__init__, all, core, inference, none, training}.py,parsing/spec_scan.py,common/lazy.py,cli/main.py). 2 lines inscan.pycarry# pragma: no coverfor filesystem-race OSError paths that aren't reproducible in unit tests.setup_environment(\"none\")thenbuild()of a non-eager spec → succeeds via lookup-miss fallbacksetup_environment(\"auto\", cue_path=...)→ daemon preloads exactly the computed setZETTA_PRELOAD_MODULESenv var → daemon preload list matches the env varzetta runan internal CUE in a dev k8s cluster, confirm worker pods get the env var and resource usage dropsPairs with
internal #198 — must merge after this lands so the lazy
__init__.pyupdates ininternal/align with the helper this PR introduces.🤖 Generated with Claude Code