Skip to content

feat: JOSS submission, test suite overhaul, and strategic refactoring#13

Merged
jameslehoux merged 10 commits into
mainfrom
claude/3dxrd-tracking-week2-Bt9ed
May 17, 2026
Merged

feat: JOSS submission, test suite overhaul, and strategic refactoring#13
jameslehoux merged 10 commits into
mainfrom
claude/3dxrd-tracking-week2-Bt9ed

Conversation

@jameslehoux
Copy link
Copy Markdown
Contributor

Summary

  • Add JOSS paper draft (paper.md, paper.bib, CITATION.cff) with full bibliography and machine-readable citation metadata
  • Expand test suite from 57 → 147 tests: error-path coverage, invariant assertions, end-to-end integration test on bundled data
  • Replace hand-rolled pure-Python Otsu with skimage.filters.threshold_otsu (~100x faster)
  • Fix load_primary_volume to return np.ndarray directly (eliminates ~200 MB transient garbage per volume load)
  • Factor duplicated CLI helpers into braggtrack/cli/_utils.py
  • Add SpotRecord TypedDict formalizing the feature-table contract across pipeline stages
  • Move DINOv2 encoder construction outside per-scan loop (prevents model reload per scan)
  • Fix notebook: clear stale outputs, remove unused import, correct misleading markdown about mock embeddings
  • Add ruff linting/formatting, CHANGELOG, py.typed marker, dynamic __version__

Test plan

  • python -m unittest discover tests — 147 tests pass
  • python scripts/ci_report.py — all 6 acceptance gates pass (unit, week1–4, smoke)
  • Notebook runs end-to-end: jupyter nbconvert --execute notebooks/braggtrack_demo.ipynb
  • pip install -e . succeeds without torch/transformers
  • Verify JOSS paper builds: docker run --rm -v $PWD:/data openjournals/inara -o pdf paper.md (optional)

- Ruff lint + format enforced across braggtrack/, tests/, scripts/
  (27 files reformatted, 10 lint issues auto-fixed, config in pyproject.toml)
- Added CHANGELOG.md following Keep a Changelog format
- Created braggtrack/py.typed PEP 561 marker
- __version__ dynamically read from package metadata with dev fallback
- Fixed: unused imports, unsorted imports, typing.Iterable→collections.abc
- LICENSE copyright updated to 2025-2026
- CI lint job gates test job (ruff check + format --check)

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
Prepares JOSS submission materials: paper.md (~1800 words covering
segmentation, feature extraction, tracking, and reproducibility),
paper.bib (14 references), and CITATION.cff. Updates pyproject.toml
with author metadata and project URLs.

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
Covers: extract_instance_table (centroid, eigenvalues, bbox, weighted
fallback), remove_small_objects, fill_holes_binary, relabel_sequential,
gaussian_blur_3d, laplacian_3d, log_enhance_3d, orthogonal_mips,
crop_spot_cube, mock encoder determinism, and io path resolution.

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
- Clear all cached cell outputs (showed pre-fix 11/22/36 spot counts;
  actual results are now 18/20/16)
- Remove unused generate_crossing_scenario import
- Fix section 5 markdown: mock embeddings don't increase fragmentation
  on well-separated data, they simply have no effect
- Fix section 6 markdown: honestly frame the flat ablation curve as
  expected mock-encoder behavior

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
…egration

- Add test_error_paths.py: ValueError for empty Otsu, empty volume,
  unsupported method, unparseable scan name, flat volume, empty/single
  frames in build_tracks, negative intensity fallback, window > length
- Add test_invariants.py: relabel always 0..N, fill_holes is superset
  of input, remove_small never introduces labels, eigenvalues non-negative
  and descending, Otsu within range, smooth reduces variance, mock
  encoder always unit-norm
- Add test_integration.py: end-to-end pipeline on bundled data verifying
  stable spot counts, required columns, sane tracking metrics, embedding
  unit norms
- Consolidate test_semantic_embeddings.py into test_semantic_week4.py,
  removing duplicate MIP/encoder/crop tests
- Fix test_segment_dataset_cli.py to use tempfile.TemporaryDirectory
  instead of leaking artifacts on failure

147 tests, all passing (up from 102).

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
…tRecord

- Replace hand-rolled pure-Python Otsu with skimage.filters.threshold_otsu
  (~100x faster on real volumes, zero risk since scikit-image is already
  a hard dependency)
- Fix load_primary_volume to return np.ndarray directly instead of
  calling .tolist() and creating millions of transient Python floats
- Move encoder construction outside per-scan loop in embed_dataset.py
  (prevents re-loading DINOv2 weights for every scan)
- Factor duplicated CLI helpers (_synth_volume_from_file, _write_csv,
  _load_feature_csv) into braggtrack/cli/_utils.py
- Add SpotRecord TypedDict in braggtrack/types.py formalizing the
  implicit feature-table contract used across all pipeline stages

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
Added an image to the README for visual enhancement.
@jameslehoux jameslehoux force-pushed the claude/3dxrd-tracking-week2-Bt9ed branch from 4901dfc to e7d78d7 Compare May 17, 2026 10:47
nbconvert appends .ipynb to --output, causing a PermissionError when
writing to /dev/null.ipynb. Use --output-dir /tmp to discard output.

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
@jameslehoux jameslehoux merged commit 3c046ef into main May 17, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant