Skip to content

Enhance segmentation with DINO backend and kinematic tracking#14

Merged
jameslehoux merged 8 commits into
mainfrom
claude/3dxrd-tracking-week2-Bt9ed
May 18, 2026
Merged

Enhance segmentation with DINO backend and kinematic tracking#14
jameslehoux merged 8 commits into
mainfrom
claude/3dxrd-tracking-week2-Bt9ed

Conversation

@jameslehoux
Copy link
Copy Markdown
Contributor

No description provided.

Add merge_nearby_labels() post-processing step that greedily merges
adjacent watershed fragments whose intensity-weighted centroids are
within a configurable distance. This addresses over-segmentation where
a single Bragg spot gets split into multiple watershed basins.

Parameter changes (data-driven from sweep on bundled scans):
- min_seed_separation: 1 → 2 (halves over-splitting, spread=1)
- CLI adds --threshold-fraction and --merge-distance (default 15)
- Notebook segment() uses merge_nearby_labels with distance=15

Before: 35/36/33 spots across 3 scans (spread=3, over-segmented)
After:  18/17/16 spots across 3 scans (spread=2, better consistency)

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
DINOv3 patch-level features → PCA → HDBSCAN clustering → 3D slice
stitching via union-find, replacing hand-tuned intensity-domain
parameters with learned feature-space representations that generalise
across beamlines and detectors.

- Add PatchFeatureEncoder protocol + MockPatchEncoder + TorchDinoPatchEncoder
- Add segment_dino() pipeline with slice extraction, clustering, upsampling,
  3D stitching, and Otsu foreground masking
- Wire --method classical|dino flag and DINO-specific CLI args
- Add scikit-learn>=1.3 to core dependencies (for HDBSCAN)
- Add 16 tests covering mock encoder, clustering, stitching, and end-to-end

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
Runs `--method dino --dino-backend mock` on bundled sample data and
validates artifacts (labels, features, summary) for all 3 scans. Also
fixes HDBSCAN on tiny volumes where few patches exist — treats
single-patch slices as one region instead of returning empty labels.

- Add scripts/check_dino_acceptance.py (mirrors check_week2_acceptance)
- Add tests/test_dino_acceptance.py
- Wire DINO acceptance gate into scripts/ci_report.py
- Fix _cluster_feature_map to handle small volumes gracefully

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
Side-by-side comparison on bundled sample data: spot counts, tri-axis
label projections, feature distributions, Dice overlap, centroid scatter,
and cross-scan consistency metrics. Uses mock backend by default.

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
- Type encoder param as PatchFeatureEncoder instead of object
- Type backend param as BackendName instead of str
- Vectorize _upsample_labels with np.repeat (was Python double-loop)
- Pre-compute label sizes in _stitch_slices_3d (avoids O(n_pairs*n_pixels))
- Eliminate double Otsu in CLI — compute threshold only in classical branch
- Use result.threshold in summary JSON (works for both methods)
- Make response field zero-size array (no LoG response in DINO path)
- Clarify slice_hw with if/elif/else instead of nested ternary
- Remove os.environ side effect from test file

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
Compute physically meaningful evolution quantities for each tracked grain:
strain (Δd/d₀), misorientation (angular drift in μ and χ), growth/dissolution
(relative intensity and volume changes), and shape evolution (anisotropy,
covariance trace). Includes summary statistics and flat-table export.

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
Deterministic 4-grain scenario with analytically known evolution:
- Grain A: linear elastic loading (strain = 0.1%/step)
- Grain B: pure rotation (0.5°/step μ, 0.3°/step χ, no strain)
- Grain C: dissolution (linear intensity/volume decay)
- Grain D: late nucleation (born scan 2, growing)

Verifies exact recovery of strain, misorientation, growth/dissolution,
shape metrics, and summary statistics through the full pipeline.

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
1. Remove all "Week N" references from docstrings, artifact paths,
   schema versions, script names, and test names. Replace with
   descriptive names: segmentation, tracking, embedding, io.

2. Fix broken import in ablation script (was importing non-existent
   _load_feature_csv from track_dataset; now imports load_feature_csv
   from _utils).

3. Remove no-op assignment in otsu.py smooth_thresholds (line that
   assigned smoothed[outlier] = smoothed[outlier]).

4. Add tests/conftest.py with shared make_spot() fixture factory,
   deduplicate _spot() helpers across test files.

5. Standardize CLI arg naming: track_dataset now uses "root" instead
   of "indir" to match all other CLI modules.

6. Extract _write_notebook() to shared write_qc_notebook() in
   cli/_utils.py, eliminating duplication between segment_dataset
   and track_dataset.

Artifact paths: week2 → segmentation, week3 → tracking, week4 → embedding
Schema versions: week2.v1 → segmentation.v1, week3.v1 → tracking.v1,
                 week4.v1 → embedding.v1/tracking_semantic.v1
Script renames: check_week2_* → check_segmentation_*, etc.
Test renames: test_week1_* → test_io_*, test_week2_* → test_segmentation_*, etc.

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
@jameslehoux jameslehoux merged commit 1dc96d3 into main May 18, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant