Supervoxel splitting with base+fork support and locks#534
Open
akhileshh wants to merge 51 commits into
Open
Conversation
Drop the inline propagate_to_coarser_scales call from write_seg; coarser mips are now the async downsample worker's responsibility. write_seg is back to a single base-scale tensorstore write, so SV splits no longer block on the full pyramid update. TestWriteSeg updated to assert the coarser scales stay zero after write_seg (propagation tested separately via TestPropagateToCoarserScales).
split_supervoxel now returns its base-resolution bbox. split_with_sv_splits collects one per call and attaches the list to Result.seg_bbox (new optional field). publish_edit includes the list in the payload and sets downsample="true" so the worker only runs on edits that touched base seg. List kept unmerged — lets the worker skip tiles outside the actual change region.
workers/downsample_worker.py consumes edits-exchange messages flagged downsample="true" and writes each non-base mip within the SV-split bbox. graph/downsample.py splits the region into pyramid_blocks (sized so no two blocks share a storage chunk at any mip), then either tinybrain'd in one call (fast path, typical small edits) or per-mip (fallback when base read exceeds memory budget). Write filtering keeps OCDBT delta proportional to the actual change. DownsampleBlockLock serializes overlapping jobs via kvdbclient's row-key lock API; 26-byte hash-prefixed keys avoid tablet hot-spots. Depends on kvdbclient lock_by_row_key / unlock_by_row_key / renew_lock_by_row_key landing first.
Used by the async downsample worker as the mip-pyramid kernel.
Closes a concurrency gap where two SV splits on overlapping L2 chunks but distinct roots can't be serialized by root locks — they acquire disjoint root-lock sets and race on seg state. L2ChunkLock serializes them via the kvdbclient row-key lock primitive. Row key = 2-byte blake2b hash + 8-byte uint64 chunk_id (10 bytes). Hash prefix keeps spatially-clustered L2 chunks from hot-spotting a single bigtable tablet under concurrent load. Primitive only — callers land separately. RowKeyLockRegistry helper moved to tests/helpers.py so L2ChunkLock and DownsampleBlockLock tests share it instead of duplicating.
Callers now get Cut(atomic_edges) | PreviewCut(ccs, illegal_split) | SvSplitRequired(sv_remapping) instead of unwrapping-by-convention or catching SupervoxelSplitRequiredError. The exception still unwinds inside LocalMincutGraph — cheapest way to bail out of deep path code — but it's caught once at the run_multicut boundary and never escapes, so callers don't use raise/catch for control flow.
MulticutOperation._apply dispatches on the tagged multicut result and, when an SV split is needed, calls the new edits_sv.split_supervoxels under its surrounding RootLock, refreshes source/sink SV IDs from seg, and retries multicut against the post-split graph. Root lock spans the whole critical section; L2ChunkLock held only around the split loop. This closes two races that existed when split_with_sv_splits handled the flow outside any lock: same-root (root lock now never released between multicut and commit) and cross-root (L2ChunkLock serializes overlapping split regions). split_with_sv_splits is deleted; handle_split calls cg.remove_edges directly.
Pre-compute each rep's bbox from the chunk coords of its CC members in sv_remapping (no coord-padding, no resolution-axis assumption). split_supervoxels builds the union lock set across reps — sparse chunks plus one L2-chunk margin for update_edges's 1-voxel overlap read — acquires once, then loops per-rep splits. _update_chunks surfaces the change_chunks that actually got new SV IDs; write_seg_chunks fires one tensorstore future per change-chunk and awaits together, so only chunks with real label changes hit OCDBT. Gap chunks between CC pieces and neighbor chunks read for the overlap never get rewritten, keeping the delta proportional to the edit. split_supervoxels also threads back the fresh source/sink SV IDs from the in-memory new_seg block (same bytes that just landed on storage), so the retry multicut sees current IDs without an extra seg read. Drops _get_whole_sv (dead since the sv_remapping switch). Adds a high-level architecture doc covering the end-to-end flow, concurrency design, and durable invariants.
Enable opening a CG's OCDBT at a prior commit via the driver's `version` spec field (int generation or ISO-8601 commit_time upper bound). Groundwork for operator-driven replay of failed SV splits, which needs clean pre-op reads against append-only storage. OCDBT stamps commits from absl::Now() with no caller-override hook, so pins will use OperationTimeStamp captured under the L2 chunk lock rather than aligning OCDBT commit times to operation time.
This comment was marked as spam.
This comment was marked as spam.
split_supervoxels is now a pure planner returning a SplitResult (seg_bboxes, source_ids_fresh, sink_ids_fresh, seg_writes, bigtable_rows). No lock acquisition, no writes. The caller (MulticutOperation._apply) holds the L2 chunk locks and fires the consolidated persist — OCDBT chunks + bigtable rows — inside an inner lock scope. seg_writes is a flat list of (voxel_slices, data) pairs across all reps so write_seg_chunks fires every chunk write as one parallel tensorstore batch. Removes the per-rep serialization in the old write_seg_chunks loop. get_seg_source_and_destination_ocdbt gains a pinned_at kwarg, forwarded to build_cg_ocdbt_spec — used later by the recovery path.
New L2 chunk counterpart to IndefiniteRootLock, keyed by chunk row. L2ChunkLock now acquires via lock_by_row_key_with_indefinite so a temporal acquire sees a crashed op's indefinite cell and refuses. IndefiniteL2ChunkLock records its chunk scope on the op-log row's L2ChunkLockScope column at __enter__ and clears it on clean exit, giving recovery a durable scope without a bigtable-wide scan. Both indefinite locks (root and L2 chunk) now short-circuit __exit__ when an exception is propagating: cells stay held, scope stays set. Partial writes may exist after an exception; leaving the cells forces subsequent ops to refuse at lock-acquire and the operator to run recovery explicitly. privileged_mode=True on either lock is the operator recovery escape hatch: skips acquire, pre-populates acquired_keys so __exit__'s value-matched release deletes the crashed op's cells. RowKeyLockRegistry (test helper) gains the three new kvdbclient primitives.
Operator recovery for SV-split ops that crashed mid-write. A worker death inside IndefiniteL2ChunkLock leaves per-chunk indefinite cells set and records the chunk scope on the op-log row. Recovery reverts partial OCDBT writes using a version-pinned read of pre-op voxels, then replays the op normally. list_stuck scans OperationLogs for ops still at CREATED past a min-age threshold. replay(cg, op_id) runs cleanup_partial_writes followed by repair.edits.repair_operation(..., unlock=True); IndefiniteL2ChunkLock's privileged-mode __exit__ deletes the crashed op's pre-existing cells after the replay's writes land. Architecture-level operator guide at docs/sv_splitting_recovery.md, linked from docs/sv_splitting.md's Concurrency section.
Pass the op's timestamp from execute → _apply → split_supervoxels → split_supervoxel → copy_parents_and_add_lineage / add_new_edges so every new-SV bigtable mutation lands at the op's logical write time. Gets atomic visibility under a parent_ts filter and makes replay's override_ts actually control what time-filtered readers see after a repair_operation. Parent-copy and Child-list writes deliberately keep the old cell's timestamp so pre-op readers still see the old hierarchy. Replace the seven-tuple in/out soup in split_supervoxels with named dataclasses: SvSplitTask (plan_sv_splits → split_supervoxel input) and SvSplitOutcome (split_supervoxel's per-task output). Drop the two unused return fields on split_supervoxel.
list_stuck filter switches from Status==CREATED to "L2ChunkLockScope set and Status != SUCCESS past min_age". The authoritative signal for stuck-ness is "scope recorded, not cleared" — worker crash (Status stays CREATED) and Python exception during persist (Status=EXCEPTION after Fix 1) both fall under it. Ops without scope aren't blocking other ops and are outside stuck_ops' concern. replay now verifies each chunk in the recorded scope actually has Concurrency.IndefiniteLock held by this op_id before running cleanup or repair. If cells are missing or held by a different op, raises with a clear error. Protects against double-replay (first run already released cells) and out-of-band clearing (manual bigtable edit, buggy release path) — both would have cleanup_partial_writes revert chunks that aren't ours.
fork_base_manifest is now an explicit step — invoked from the ingest CLI's --ocdbt path or the seg_ocdbt notebook — rather than being auto-triggered on first ws_ocdbt_scales access. ws_ocdbt_scales asserts fork_exists() so a missing fork fails with a clear pointer instead of a tensorstore mismatch/not-found error.
Stuck-op detection keys off L2ChunkLockScope being populated, not Status=CREATED — that filter also covers caught-exception paths (Status=FAILED) where cells are held but the row isn't CREATED. Recovery now verifies each chunk's IndefiniteLock is actually held by the op before cleaning up, so a stale scope can't have us revert chunks another op owns. Reflect both in docs and the list CLI help. Drop "mode-downsample" from the SV-split diagram — tinybrain owns the algorithm; the doc shouldn't pin it.
`_rep_bbox` enveloped every piece of the cross-chunk-connected rep — for physical SVs split into many pieces across chunks, the bbox grew far wider than the cut surface needs. Replace with `_coords_bbox`: envelope of the user-placed source/sink coords plus a one-chunk margin (matches the existing L2 lock margin and 1-voxel shell). After the seg read, `cut_supervoxels` is intersected with the IDs present in seg, so the "whole sv" set names only the rep pieces the bbox actually touches. Pieces of the rep outside the bbox keep their existing IDs — their cross-chunk edges to in-bbox split fragments are routed via the 1-voxel shell, edges between two unsplit pieces don't change. Adds `TestCoordsBbox` covering envelope+margin, volume-bound clipping, and that `plan_sv_splits` returns a tight bbox regardless of how distant the rep's other pieces sit.
handle_supervoxel_id_lookup and id_helpers.get_atomic_ids_from_coords both short-circuit to lookup_svs_from_seg whenever ocdbt_seg is true, regardless of node-id layer. 2D slice clicks send L1 IDs from a view that may be stale after an SV split; 3D mesh clicks send a root and no L1 at all. Either way the current SV is what matters, so we read seg at the click coords and let downstream same/different-root checks surface any staleness with the sv_id->root diagnostic. Also vectorize lookup_svs_from_seg's per-coord indexing into a single advanced-index op.
`_schema_from_src` was passing both `domain` and `shape` to ts.open when cloning the source schema to the destination. For sources with a non-zero `voxel_offset`, `domain` carries absolute bounds (e.g. [17756, 62244)) while `shape` implies an origin of 0, and tensorstore refuses to merge them — base creation fails on any precomputed source that doesn't start at the origin. `domain` already encodes both extent and offset, so passing only it is sufficient and avoids the conflict.
cloudbuild now uses BuildKit registry cache (`:buildcache`) so unchanged stages reuse the prior build's layer artifacts and already-warm nodes skip re-downloading them on pull. The fresh-ingest CLI no longer infers `ocdbt_populate_base` from `base_exists` — manifest presence didn't reflect whether chunks were actually copied. Replaced with an explicit `--populate-base` flag; operator sets it on first ingest and omits on subsequent runs.
`docker buildx create --use` stores the "current builder" client-side in ~/.docker/buildx/, which doesn't persist across cloudbuild steps — each step runs in a fresh container. The build step then fell back to the default `docker` driver, which doesn't support `--cache-to type=registry`. Cache export silently no-op'd, so the :buildcache image never got pushed and every subsequent build's --cache-from missed. Move the create + use into the same step as the build and explicitly set --driver docker-container.
create_base_ocdbt and wipe_base_ocdbt both wiped through the OCDBT driver, which on an empty dir creates a default-config manifest stub (max_inline_value_bytes=100) and on a populated dir only clears the B+tree leaving the manifest in place. Either way the subsequent open with OCDBT_CONFIG mismatches. Wipe through the underlying GCS/file driver so the manifest itself gets deleted. Move OCDBT base setup before cg.create() so a failure in OCDBT setup doesn't leave behind an orphan bigtable table.
…mic txn Move the precomputed→OCDBT base copy out of L2 atomic tasks into a single parent-layer task per chunk, layer configurable via --populate-layer (default 4). Each task batches every underlying L2 chunk across every MIP scale into one OCDBT commit via ts.Transaction(atomic=True) — without atomic=True the precomputed driver splits a multi-chunk write into two OCDBT sub-commits regardless of the surrounding python-level transaction. The chosen populate-layer is persisted in <ws>/ocdbt/.populated/meta.json alongside the per-chunk markers; mixing layers is unsupported, so a mismatch errors out and tells the operator to delete <ws>/ocdbt/ if they want to repopulate at a different layer.
… status + per-chunk node/edge logs `ingest graph --retry` now loads the existing IngestionManager from redis instead of requiring the OCDBT flags to be re-passed; `dataset` becomes optional under --retry. The base/populate-meta/cg.create one-time setup is skipped — only the per-CG fork manifest is re-wiped and L2 tasks are re-enqueued. The job_type_guard already handles cross-mode misuse. create_parent_chunk fires the OCDBT populate on a worker thread before add_parent_chunk so the GCS copy overlaps with the BigTable hierarchy build; the populate future is awaited (and exceptions propagated) before marking the task complete. print_status now surfaces ocdbt_seg / ocdbt_populate_base / ocdbt_populate_layer so operators can see at-a-glance what mode the running job is in. add_atomic_chunk and add_parent_chunk each log node and edge counts so each task's BigTable write volume is visible in the worker logs.
…le refresh
print_status now renders a single rounded Rich Panel via Live. Two
header mini-tables (graph fields, ocdbt fields when enabled) sit above
the per-layer progress/queue/busy table; both header groups share
computed column widths so their boundaries line up. job_type lives in
the Panel title so it stays out of the row data.
Busy-worker count comes straight from Redis: SMEMBERS rq:workers:l{N}
to enumerate worker keys per queue, then a single pipelined HGET state
across all of them and a sum of state == b'busy'. Two round-trips per
refresh regardless of worker count, displayed as busy/total.
Refresh cadence defaults to 5s and is overridable via --refresh on
both `ingest status` and `upgrade status`.
…ations
Each refresh was constructing a fresh Queue plus a fresh
FailedJobRegistry per layer, and the first FailedJobRegistry access
triggered a lazy `from rq.registry import FailedJobRegistry` import,
which dominated first-call latency. The relevant keys are stable
strings (`rq:queue:l{N}`, `rq:failed:l{N}`, `rq:workers:l{N}`), so
compute them once before the live loop and pass them straight into
the pipeline.
Overlapping the GCS copy with add_parent_chunk via a worker thread doubled peak memory inside the worker (the multi-scale read buffer and the BigTable hierarchy state are both live), which was OOM-killing workers. Drop the ThreadPoolExecutor and let the populate run inline after add_parent_chunk completes; _populate_ocdbt_chunk stays as a named helper.
…ntermediate numpy Passing the source TensorStore directly into write() lets tensorstore copy through its own pipeline rather than materializing a uint64 numpy array in Python and then handing it back to write(). On a 512 MiB region the duplicate-buffer cost goes away — peak RSS drops by roughly one scale's worth. Inside the atomic transaction the txn itself still buffers staged writes, so this trims the avoidable half of memory, not all of it.
… config
Split ocdbt.py into ocdbt/{__init__,meta,utils,main}.py. OcdbtConfig
dataclass replaces the OCDBT_CONFIG module constant; spec builders
take it explicitly. Per-CG settings persist in
custom_data["ocdbt_config"]; legacy custom_data["seg"] still reads.
Drop --ocdbt, --populate-base, --populate-layer, --sv-split-threshold
from `ingest graph`; those move to yaml under a top-level
`ocdbt_config:` section. bootstrap() parses it.
IngestionManager carries one ocdbt_config dict with @Property
accessors so cluster.py callers don't change.
max_inline_value_bytes default bumped to 1 MiB (tensorstore's hard
ceiling).
…hared with ws_cv ChunkedGraphMeta.ocdbt_config now consults <ws>/ocdbt/.populated/meta.json at runtime and resolves it on top of custom_data per the info-file > custom_data > defaults order. New module-level _redis_cached_json(key, loader) caches the JSON dict in Redis so distributed workers hit Redis once per process instead of GCS per CG instance; ws_cv refactored to use the same helper.
…p edges-OCDBT path New ingest/ocdbt.py owns the per-chunk populate task, the shared base/fork setup, and a coordinator-server context manager that the populate-layer CLI will wire in next. populate_chunk fails loudly when the coordinator address isn't advertised in Redis — uncoordinated parallel commits race the shared manifest and leak orphan d/ files, the exact bug this module exists to prevent. The --ocdbt-edges branch and start_ocdbt_server (edges-only writer with a different base path) go away with their tests; no caller is left.
`flask ingest layer N` now wraps queue_layer_helper in a DistributedCoordinatorServer when N matches the OCDBT populate layer: the server starts in-process, its address is advertised in Redis, and the foreground blocks so workers route every commit through it. Eliminates the manifest-CAS races that were emitting orphan d/ files at ~47× the rate of committed data. Both CLIs lose --reset-ocdbt — wipe out of band with `gcloud storage rm -r gs://<ws>/ocdbt/`. The shared base/fork/persist dance moves out of both `graph` subcommands into the single setup_base helper. upgrade's legacy custom_data["seg"] write is also replaced by the canonical ocdbt_config schema setup_base persists.
…s.requeue_chunk ingest_chunk and upgrade_chunk were identical aside from their L2 / L3+ function pair. The shared body lives in utils.requeue_chunk; each CLI's chunk command is one line that passes its own (atomic_fn, parent_fn).
bootstrap returns (meta, ingest_config, client_info, ocdbt_config_dict) since the OCDBT-package refactor, but the test still unpacked three values and raised at call site. Update the unpack and assert the new dict element is present.
queued sits at position 2 — the column you actually watch during a run. Layer centers under its header now that it's the lone left column. % becomes progress; done becomes completed. The composite workers [busy] column splits into two right-aligned columns (workers, busy) so the total and the active count line up by digit and don't visually fuse. Global padding bumps to (0, 2) for breathing room. Headers go through rich.text.Text so any future bracketed header would render literally instead of being parsed as Rich markup.
Workers were setting OCDBT_COORDINATOR_HOST/PORT env vars before opening
the base kvstore. The tensorstore binary doesn't reference those strings —
the OCDBT driver looks at the spec's `coordinator: {address: ...}` field
(or the `ocdbt_coordinator` Context resource), nothing else. So every
"distributed" populate worker has been writing direct, racing the shared
manifest, and emitting orphan d/ files: ~634 GiB of d/ at under 3% of
layer 3 done before this was caught.
open_base_ocdbt takes an optional coordinator_address and injects it into
the kvstore spec. populate_chunk forwards an address when one is passed
(distributed) and writes direct when none is (single-process / notebooks),
since uncoordinated single-writer is fine. cluster.create_parent_chunk
is the only distributed caller and is where the fail-loud lives: it
reads the address from Redis via get_coordinator_address and raises if
unset, so a worker can never silently regress to the env-var no-op.
Verified by opening an OCDBT kvstore with coordinator=localhost:1 — the
first write raises UNAVAILABLE, confirming the spec field is the actual
routing knob.
tensorstore's distributed-OCDBT path rejects cross-key atomic transactions — every populate task with the coordinator wired in was raising "Cannot read/write ... as single atomic transaction" the moment copy_ws_bbox_multiscale tried to span the info key and chunk keys (and across scales). The fix is just to drop atomic=True. The transaction object itself is what batches every per-scale .write() into one OCDBT commit; atomic only adds cross-key isolation, which we don't need here and which the coordinator path forbids. Verified by reproducing both shapes: * coord + atomic=True → fails immediately on info+chunk * coord + Transaction() → single d/ file per task (multi-scale) Concurrent-writer check under the coordinator: 8 threads × 20 commits produced 183 d/ files (≈1.14× overhead from manifest updates). Without the coordinator the same workload produced 1017 d/ files (6.4× orphan ratio), which matches the manifest-CAS-race behavior the coordinator fixes.
Each populate task now prints a layer/coords header line, then the multiscale commit prints a tight summary of how many voxels landed across how many scales. Per-scale detail stays at debug level so the default operator view stays one informative line per task — enough to see what's being done and how much, without scrolling.
New pychunkedgraph/graph/ocdbt/debug.py holds all the diagnostic plumbing — humanize_count, failure_envelope (host/pod/versions/ traceback/timestamp/coordinator env), bbox_failure_payload (per-scale shape/chunk/key-count/raw-bytes + src+dst kvstore specs), and dump_failure_to_gcs (writes JSON to $ERROR_DUMP/<dump_tag>__<utc>.json). copy_ws_bbox_multiscale stays slim: when ERROR_DUMP is unset, no per-scale bookkeeping, no payload, no GCS write — happy path runs unchanged. When ERROR_DUMP is set, per-scale rows accumulate and on failure get serialized via the helpers. populate_chunk threads a <graph_id>/L<layer>/<x>_<y>_<z> dump_tag so each failing task self- identifies in the bucket and multiple experiments share one dump root without collisions.
`flask ingest purge_layer <N>` drops all per-layer redis state — the RQ
queue (deletes jobs along with it), every per-layer registry (failed,
started, deferred, scheduled, finished, canceled, addressed by the
registry's own .key so we don't hardcode RQ's internal key naming),
and the pychunkedgraph completion set f"{N}c". Confirmation prompt
gates the destructive action.
Used when you restore the table from a previous layer's backup and
need to re-enqueue from scratch — without this, stale queue + completion
state from the aborted run would shadow the fresh re-run.
Restructure `flask ingest graph`: --retry/-r reuses the IngestionManager from redis but otherwise runs the same setup path as a fresh ingest — opens the existing table via imanager.cg, calls setup_base when OCDBT is enabled (idempotent: recreates the base + info if absent, reconciles config, forks for this CG), and enqueues. Skipping only cg.create() means an externally-wiped OCDBT base now recovers on retry without touching Bigtable. New --skip-queue/-s does setup but no L2 enqueue, for operators who want to verify the OCDBT base / fork state before queueing work. Short forms for retry/-r, skip-queue/-s, test/-t.
c1c51ce to
9b86434
Compare
In-repo notes on every OCDBT spec field, config sub-field, ocdbt_coordinator resource field, DistributedCoordinatorServer constructor field, and runtime defaults — verified empirically against the venv's tensorstore binary by probing with intentional-bad-value plus spec round-trips. Captures the distributed-mode constraints: atomic transactions forbidden, cooperator-forwarded RPC bounded at gRPC's 4 MiB default with no Python knob to raise it, leases held per btree node so disjoint user-key writes still cross-forward. Catalogs the levers that do exist (max_decoded_node_bytes, max_inline_value_bytes, lease_duration) and ones that look promising but aren't fields.
9b86434 to
5873aa8
Compare
Reading tensorstore's distributed btree_writer.cc StagePending: values ≤ max_inline_value_bytes are carried inline in the encoded mutation, so the cooperator's per-leaf WriteRequest packs all of them together. With the previous 1 MiB threshold, every compressed_segmentation chunk that landed below 1 MiB stayed inline and contributed its full bytes to the gRPC forward — which blows past tensorstore's hardcoded 4 MiB gRPC max-receive whenever a few inline chunks land on the same btree leaf. Dropping to 4 KiB pushes every chunk value through WriteData into a d/ file and replaces it in the mutation with an IndirectDataReference, so forwarded RPCs carry only small refs regardless of value size. Small metadata (info JSON, populate-marker files) still stays inline. Tradeoff: the old comment cited "7× GCS bloat" at the 100-byte default — that ratio came from per-chunk independent zstd framing dominating tiny payloads, not chunk-sized ones. At 4 KiB+ payloads the per-chunk zstd overhead is in the single-digit-percent range. Reference doc updated to credit max_inline_value_bytes as the actual RPC-size lever and correct the cooperator-batching section.
tensorstore raises ValueError with an absl status-code prefix on transport-level failures. A DNS hiccup or a GCS 5xx in is_chunk_populated or mark_chunk_populated was killing the whole populate task. Wrap the four marker / populate-meta helpers in a tenacity retry that matches UNAVAILABLE / DEADLINE_EXCEEDED / ABORTED / INTERNAL prefixes only — NOT_FOUND, INVALID_ARGUMENT, RESOURCE_EXHAUSTED still propagate.
…ask-body mode flags
`ingest layer N` now owns the whole OCDBT lifecycle when N matches
ocdbt_populate_layer: idempotently runs setup_base (base + fork +
config reconcile), starts the coordinator, queues tasks. Moves
setup_base out of `ingest graph` so there's a single command that
manages OCDBT. Re-pickles the IngestionManager so workers see the
resolved config.
Adds three flags to `ingest layer`:
--queue-only/-q : skip the coordinator (one is running elsewhere).
--ocdbt-only/-o : workers run only OCDBT populate, skipping
add_parent_chunk. For re-populating a freshly created/wiped OCDBT
base against an already-built bigtable graph.
--ingest-only/-i : workers run only add_parent_chunk, skipping
OCDBT populate. For rebuilding bigtable against an existing
OCDBT base.
Refactors create_parent_chunk to a single function with a `mode`
parameter bound at queue time via functools.partial — no duplicated
OCDBT-populate block. _post_task_completion runs unconditionally so
layer-progress tracking in redis stays consistent across modes.
Adds IngestionManager.is_ocdbt_populate_layer(layer) so the compound
guard isn't repeated at every call site.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Supervoxel splits in OCDBT-backed segmentations are now atomic under worker failure, with an operator tool for recovering stuck ops.
Locks
Two indefinite locks, both holding their cells on exception:
On unclean exit, both hold and the L2 scope is written to the op-log row.
Stuck-op signal
Non-empty scope field on the op-log row. Covers both
CREATED(worker crash) andFAILED(caught exception) paths; status alone missedFAILED. Clean exit clears it.Cleanup-then-replay
Concurrent ops on non-overlapping chunks advance OCDBT during the outage, so recovery can't use one pinned view.
OCDBT time-travel
Tensorstore
versionfield in the spec (integer generation or ISO-8601Ztimestamp, interpreted ascommit_time ≤ T). Threaded aspinned_at. Pinned handles are read-only.Reader contract
dataset_infonow publishes the full kvstore spec (kvstack layers + OCDBT config + data prefixes) instead of path fragments, so readers pass it verbatim to tensorstore. The multi-scale open asserts fork presence, failing with a clear message instead of a tensorstore internal.