Enhance segmentation with DINO backend and kinematic tracking#14
Merged
Conversation
Add merge_nearby_labels() post-processing step that greedily merges adjacent watershed fragments whose intensity-weighted centroids are within a configurable distance. This addresses over-segmentation where a single Bragg spot gets split into multiple watershed basins. Parameter changes (data-driven from sweep on bundled scans): - min_seed_separation: 1 → 2 (halves over-splitting, spread=1) - CLI adds --threshold-fraction and --merge-distance (default 15) - Notebook segment() uses merge_nearby_labels with distance=15 Before: 35/36/33 spots across 3 scans (spread=3, over-segmented) After: 18/17/16 spots across 3 scans (spread=2, better consistency) https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
DINOv3 patch-level features → PCA → HDBSCAN clustering → 3D slice stitching via union-find, replacing hand-tuned intensity-domain parameters with learned feature-space representations that generalise across beamlines and detectors. - Add PatchFeatureEncoder protocol + MockPatchEncoder + TorchDinoPatchEncoder - Add segment_dino() pipeline with slice extraction, clustering, upsampling, 3D stitching, and Otsu foreground masking - Wire --method classical|dino flag and DINO-specific CLI args - Add scikit-learn>=1.3 to core dependencies (for HDBSCAN) - Add 16 tests covering mock encoder, clustering, stitching, and end-to-end https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
Runs `--method dino --dino-backend mock` on bundled sample data and validates artifacts (labels, features, summary) for all 3 scans. Also fixes HDBSCAN on tiny volumes where few patches exist — treats single-patch slices as one region instead of returning empty labels. - Add scripts/check_dino_acceptance.py (mirrors check_week2_acceptance) - Add tests/test_dino_acceptance.py - Wire DINO acceptance gate into scripts/ci_report.py - Fix _cluster_feature_map to handle small volumes gracefully https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
Side-by-side comparison on bundled sample data: spot counts, tri-axis label projections, feature distributions, Dice overlap, centroid scatter, and cross-scan consistency metrics. Uses mock backend by default. https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
- Type encoder param as PatchFeatureEncoder instead of object - Type backend param as BackendName instead of str - Vectorize _upsample_labels with np.repeat (was Python double-loop) - Pre-compute label sizes in _stitch_slices_3d (avoids O(n_pairs*n_pixels)) - Eliminate double Otsu in CLI — compute threshold only in classical branch - Use result.threshold in summary JSON (works for both methods) - Make response field zero-size array (no LoG response in DINO path) - Clarify slice_hw with if/elif/else instead of nested ternary - Remove os.environ side effect from test file https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
Compute physically meaningful evolution quantities for each tracked grain: strain (Δd/d₀), misorientation (angular drift in μ and χ), growth/dissolution (relative intensity and volume changes), and shape evolution (anisotropy, covariance trace). Includes summary statistics and flat-table export. https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
Deterministic 4-grain scenario with analytically known evolution: - Grain A: linear elastic loading (strain = 0.1%/step) - Grain B: pure rotation (0.5°/step μ, 0.3°/step χ, no strain) - Grain C: dissolution (linear intensity/volume decay) - Grain D: late nucleation (born scan 2, growing) Verifies exact recovery of strain, misorientation, growth/dissolution, shape metrics, and summary statistics through the full pipeline. https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
1. Remove all "Week N" references from docstrings, artifact paths,
schema versions, script names, and test names. Replace with
descriptive names: segmentation, tracking, embedding, io.
2. Fix broken import in ablation script (was importing non-existent
_load_feature_csv from track_dataset; now imports load_feature_csv
from _utils).
3. Remove no-op assignment in otsu.py smooth_thresholds (line that
assigned smoothed[outlier] = smoothed[outlier]).
4. Add tests/conftest.py with shared make_spot() fixture factory,
deduplicate _spot() helpers across test files.
5. Standardize CLI arg naming: track_dataset now uses "root" instead
of "indir" to match all other CLI modules.
6. Extract _write_notebook() to shared write_qc_notebook() in
cli/_utils.py, eliminating duplication between segment_dataset
and track_dataset.
Artifact paths: week2 → segmentation, week3 → tracking, week4 → embedding
Schema versions: week2.v1 → segmentation.v1, week3.v1 → tracking.v1,
week4.v1 → embedding.v1/tracking_semantic.v1
Script renames: check_week2_* → check_segmentation_*, etc.
Test renames: test_week1_* → test_io_*, test_week2_* → test_segmentation_*, etc.
https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.