feat: seg contact training, inference, and data pipeline by mgschm · Pull Request #1216 · ZettaAI/zetta_utils

mgschm · 2026-04-10T13:51:20Z

Summary

Seg contact layer: backend, procs (resampling, normalization, augmentation), tensor_utils for shared training/inference conversion
SegContactDataset: affinity filtering, channel masking, noise augmentation, per-point affinity channel
RebatchingDataLoader: variable-size chunk → fixed-batch conversion with shuffle buffer
PointNet architecture: registration for pointcloud classification/segmentation
Seg contact compare op: validation metrics and HTML report generation
Training pipeline: multi-val-dataloader, ValCheckIntervalGuard, _get_worker_init_fn with CUDA disable
Subchunkable/volumetric apply flow: max_random_chunks option
Convnet: autocast toggle for load_and_run_model
Comprehensive test suite (100% coverage on seg_contact_dataset.py and data_loader.py)

Submodule PRs

internal: https://github.com/ZettaAI/internal/pull/189
specs: direct merge to main (no branch protection)

Test plan

288 seg_contact + data_loader tests passing
Full pytest suite: 2052 passed, 80 skipped, 1 flaky (pre-existing SQS timing test)
99% overall coverage
black + isort formatting applied
pylint 9.87/10 (all issues pre-existing)
mypy: no new errors (110 pre-existing across 12 files, same count on main)
CI passes

🤖 Generated with Claude Code

Add try/except import in architecture/__init__.py so pointcloud network registrations are loaded when zetta_utils modules are loaded. The import is wrapped in try/except to handle environments without the pointnet package installed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add resample_points() with distance-weighted sampling (uniform, inverse_r, inverse_r2) - Add resample_pointclouds() for per-segment and contact-face resampling - Add resample_combined_pointcloud() for combined pointcloud resampling - Add deduplicate_pointclouds() to remove duplicate points - Add apply_random_flip() for random axis flipping augmentation - Add randomize_segment_identity() to swap seg_a/seg_b labels

- Add contact_faces_original_nm field to preserve pre-normalization coordinates - Change segment labels from 0/1 to -1/+1 encoding - Add optional 5th affinity channel (per_point or mean mode) - Add channel masking support with global/local/random modes - Add affinity noise augmentation - Include info_path in output for visualization support

- Add format_version attribute to SegContactLayerBackend (default 1.1) - Load format_version from info file (defaults to 1.0 for old data) - Conditionally read representative_points only for format >= 1.1 - Make representative_points optional (None for format 1.0) - Add explicit error checks for missing representative_points: - Reading format >= 1.1: fail if data missing - Writing: fail if representative_points is None - Handle None representative_points in randomize_segment_identity - Use packaging.version.Version for proper version comparison - Update test to expect format_version 1.1 for new backends

Previously the write path always required representative_points, even when writing format 1.0 data. Now properly checks format_version before writing representative_points section.

- Add format_version field (default 1.1 for new layers) - Inherit format_version from source in from_reference method - Use params.format_version in make_info instead of hardcoded value

…version Extract pointcloud-to-tensor conversion into shared module to ensure consistent segment labeling between training and inference. Refactor SegContactDataset to use new utilities.

New operation to run PointNet contact merge inference and write merge_probabilities to seg_contact layer.

Script to compute aggregate AUC-PR/AUC-ROC metrics across all chunks by reading written merge_probabilities and comparing to ground truth.

…ntact data - Make contact_faces optional on SegContact (default None) - Skip _write_contacts_chunk when all contact_faces are None - Simplify ContactMergeOp to build minimal SegContact with only id, seg_a, seg_b, com, and merge_probabilities

…ination - Add required --source-path flag for contacts and GT data - Use os.path.join for trailing-slash safety in GCS paths - Switch mean affinity to pytorch nonzero-mask computation - Filter contacts by COM within bbox (matching backend behavior) - Generate chunk keys from info file grid instead of --chunk-size - Add early validation that prediction files exist at expected path

…laimed flag), max_unclaimed_vx/fraction, min_interface_gt_fraction filters

…, and chunk exclusion - Interactive histogram dashboard for seg contact filter stats with threshold sliders, GT-stacked bars, and neuroglancer popup links - Annotation system: per-contact correct/wrong/unclear labels with notes, localStorage persistence, import/export, annotation-colored bar mode with 4 categories (correct/wrong/unclear/unannotated) - Per-GT category sample tables with deterministic ordering (seeded PRNG), pagination, and weighted annotation summaries - GT segment refs (gt_refs_a/b) written to parquet and selected in neuroglancer Ground Truth layer links - Chunk exclusion filter: text input with range support + ctrl+click toggle - Tests for segment metrics and contact filter stats helpers

…lows Allows processing a random subset of chunks (seeded for reproducibility) instead of all chunks, useful for sampling-based test runs or stats collection.

Add optional nucleus_layer param to SegContactOp. In filter stats mode, reads nucleus segmentation at finest available resolution and records has_nucleus flag per chunk in parquet. Production mode is unaffected — nucleus filtering deferred to training pipeline.

Store nucleus segmentation path in dataset info for neuroglancer links.

Dashboard changes: nucleus checkbox in chunk panel to exclude chunks with nuclei, chunk bar dimming respects unfiltered mode, click-to-highlight on chunk bars, and nucleus segmentation added as hidden layer in ngl links.

…ergeOp

…s, mesh LRU cache, np.isin dtype fix - Make reference_layer optional in SegContactOp (no GT needed for production) - Add constraint_layers dict for filtering contacts by dominant segment labels - Add MeshLRUCache with memory budget using cachetools.LRUCache - Sort pairs by segment frequency to maximize cache hits - Parallelize mesh downloads with ThreadPoolExecutor (num_procs) - Add mesh_lod parameter for downloading lower-resolution meshes - Add early affinity filter before expensive boundary/COM filters - Vectorize pair-level filters using pandas reindex - Fix np.isin dtype bug that falsely blacklisted ~15% of segments with large uint64 IDs - Add model.eval() for ckpt+json model loading (fixes batchnorm in eval mode) - Add csv_output_path to ContactMergeOp for per-chunk score export - Add skip-if-exists check in SegContactOp - Add NaN-to-zero clamping for model output - Handle MeshMissingError in mesh downloads Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ing, remove zero-padding assumptions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…p_chunks_with_nucleus option, MeshMissingError compat Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…_memory, precomputed mean_affinity, ValCheckIntervalGuard Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…thority filtering, PR curve axis controls Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…odel Aligns with the criteria/loss_weights refactor in ContactMergeRegime: each criterion key (here "merge") maps to a batch key. Targets are now [B,1] to match the PointNet model output shape, avoiding ambiguous broadcasting in the loss computation. Bumps internal submodule. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add dominant_ref_only mode to _compute_seg_to_ref_by_segment that keeps only the highest-overlap reference per segment, avoiding false-positive merges from sliver overlaps at misaligned supervoxel boundaries. SegContactOp now writes both gt_merge_label_set and gt_merge_label_dominant columns (plus gt_dominant_ref_a/b) so downstream analysis can compare criteria without re-running. Visualization gains a set-vs-dominant toggle and NG links switch between full ref sets and dominant refs accordingly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Read-proc that zeros out segmentation voxels whose IDs are not in a neuroglancer_segment_properties allowlist, enabling on-the-fly proofreading masks. Properties file is fetched once per worker via lru_cache. Includes dtype-safe ID comparison with overflow detection. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Previously affinity noise was added to all points including segment points (which have 0 in the affinity channel by construction), injecting spurious non-zero values. Now mask noise to CF points only via the segment label channel (!=+-1). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Runs zetta run for all (dataset, variant, agg_level) combinations with configurable parallelism. Supports per-dataset variant/agg lists, temp cue files to avoid edit races, and per-job log files. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add tests for tensor_utils, procs (resample, dedup, flip, identity swap, normalize with use_pointcloud_radius), and dataset helpers (pad/truncate, channel masking, affinity filtering). Fix 9 pre-existing broken tests in test_seg_contact_dataset caused by target->merge rename and empty-return logic that were never updated. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add comprehensive tests for RebatchingDataLoader helpers (_squeeze_chunks, _cat_chunks, _buf_len, _buf_index, _buf_slice, _tensor_rebatch, _prefetch, _pin_batches) and end-to-end integration. Add dataset tests for contact_faces_original_nm, affinity noise, channel masking, mean affinity mode, contact_label=None, and missing config key skipping. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Regenerate requirements.modules.txt to include pointnet (needed for Docker image builds). Update internal submodule with configurable optimizer for ContactMergeRegime. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… subprocesses The builder ctx mngr in building.py sets CURRENT_BUILD_SPEC to the full JSON-serialized spec, which reaches several MB for large training configs. Leaving it in os.environ caused execve-based subprocesses (wandb-core, DDP workers, spawn DataLoader workers) to inherit the bloated env and crash with "Argument list too long". Pop before consuming; downstream code uses ZETTA_RUN_SPEC_PATH or trainer.log_config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…and dict-return from mesh.exists - gt_dominant_ref_{a,b}: build via pd.array(..., dtype="UInt64") instead of Series.map(); the map() path coerced None-bearing columns to float64, silently rounding IDs above 2^53. Old parquets need regeneration. - Skip mesh.exists call when info["mesh"] is None (saves a GCS RTT) and set has_mesh_* = False. - mesh_cv.mesh.exists() returns dict for some source classes; iterate dict.items() to avoid the bare-zip bug that marked every seg as has-mesh on dict returns. - Add asserts in SegContactOp / AddPointcloudsOp on layers without a mesh field, with actionable guidance (run igneous meshing or use collect_filter_stats_only=True). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ex backslashes in chunk-exclude JS - Detect parquets written before the UInt64 fix (gt_dominant_ref_{a,b} with float dtype) and warn that GT IDs > 2^53 are rounded and NGL links point at nonexistent segment IDs. - Escape \\d and \\s in the parseChunkExcludeInput regex literals (the Python f-string was eating the single backslash). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pulls in MEC seg-contacts spec + train/inference spec updates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

supersergiy and others added 30 commits April 10, 2026 12:25

Add point cloud network support (PointNet, PointNet++, DGCNN)

95840f0

update requirements

fcea913

feat: update internal submodule with contact_merge regime enhancements

14dd27a

feat: update specs submodule with contact merge training configs

9291be4

fix: only write representative_points for format_version >= 1.1

f721e41

Previously the write path always required representative_points, even when writing format 1.0 data. Now properly checks format_version before writing representative_points section.

feat: add format_version support to SegContactInfoSpecParams

d448d5f

- Add format_version field (default 1.1 for new layers) - Inherit format_version from source in from_reference method - Use params.format_version in make_info instead of hardcoded value

feat(seg_contact): add tensor_utils for shared training/inference con…

bbf9859

…version Extract pointcloud-to-tensor conversion into shared module to ensure consistent segment labeling between training and inference. Refactor SegContactDataset to use new utilities.

feat(seg_contact): add ContactMergeOp for inference

f0ea618

New operation to run PointNet contact merge inference and write merge_probabilities to seg_contact layer.

feat(scripts): add compute_contact_merge_metrics.py

058214f

Script to compute aggregate AUC-PR/AUC-ROC metrics across all chunks by reading written merge_probabilities and comparing to ground truth.

chore: update specs and internal submodules

28e3a39

feat(PR/ROC curves): add PR/ROC curves PDF output to script

196075e

feat: cuda semaphore, cuda cache release

4fd23d2

chore: update internal submodule

923eea7

feat: add optional max_offtarget_vx/fraction (+offtarget_includes_unc…

a4716aa

…laimed flag), max_unclaimed_vx/fraction, min_interface_gt_fraction filters

chore: update specs and internal submodules

0a62833

feat: add max_random_chunks option to subchunkable/volumetric apply f…

b91e1c5

…lows Allows processing a random subset of chunks (seeded for reproducibility) instead of all chunks, useful for sampling-based test runs or stats collection.

chore: updated specs submodule

72a9adc

feat: add nucleus_path to seg_contact info spec

9daab1a

Store nucleus segmentation path in dataset info for neuroglancer links.

feat: add nucleus filter and ngl layer to filter stats dashboard

61f8b1d

Dashboard changes: nucleus checkbox in chunk panel to exclude chunks with nuclei, chunk bar dimming respects unfiltered mode, click-to-highlight on chunk bars, and nucleus segmentation added as hidden layer in ngl links.

chore: update specs submodule

de9a690

bfix: respect version of seg contact dataset

9eb7ffe

chore: updated specs

d12e0c2

mgschm and others added 23 commits April 10, 2026 13:03

feat: add autocast toggle to load_and_run_model, disable for ContactM…

453520c

…ergeOp

chore: update specs

4f37b9a

chore: update submodules, add .claude/ to .gitignore

53c9a23

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: add logging guideline to CLAUDE.md

221aaa9

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: seg contact layer - cache info reads, fast-path uniform resampl…

e6abe76

…ing, remove zero-padding assumptions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: add SegContactCompareChunkOp for comparing contact outputs, ski…

150b306

…p_chunks_with_nucleus option, MeshMissingError compat Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: training pipeline - multi-val-dataloader, rebatching filter/pin…

f1cd83c

…_memory, precomputed mean_affinity, ValCheckIntervalGuard Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat: visualization - skip chunks without predictions, per-dataset au…

d0224f3

…thority filtering, PR curve axis controls Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: updated specs

61f245d

test: cover stochastic channel mask skip path with seeded test

e66c6e6

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test: cover empty pointclouds debug log path in dataset

bb8b88d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test: cover pin_memory, prefetch, and scalar cat_chunks paths

38d65ff

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: update specs submodule to branch tip

a878f82

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

style: apply black and isort formatting

d9e1762

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: update specs submodule (inference model checkpoint)

1ab8e7f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

mgschm requested review from dodamih and supersergiy April 10, 2026 13:54

mgschm and others added 5 commits April 10, 2026 21:38

chore: bump specs submodule to martin/seg-contact-dev tip

f3856cd

Pulls in MEC seg-contacts spec + train/inference spec updates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: seg contact training, inference, and data pipeline#1216

feat: seg contact training, inference, and data pipeline#1216
mgschm wants to merge 65 commits into
mainfrom
martin/seg-contact-dev

mgschm commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mgschm commented Apr 10, 2026

Summary

Submodule PRs

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants