Skip to content

Camera ready analysis2 migration#7

Merged
McClain-Thiel merged 5 commits into
mainfrom
camera-ready-analysis2-migration
May 13, 2026
Merged

Camera ready analysis2 migration#7
McClain-Thiel merged 5 commits into
mainfrom
camera-ready-analysis2-migration

Conversation

@McClain-Thiel

Copy link
Copy Markdown
Collaborator

No description provided.

McClain-Thiel and others added 5 commits May 1, 2026 11:15
Resamples rejection-sampling and best-of-16 baselines at the sweep-optimal
temperature per model (Base 1.0, SFT 1.0, GRPO 1.15) and re-runs strict QC
through the actual analysis2 pipeline (BLAST against curated oriDB +
AMRFinder + repeats2 + two-stage filter). Replaces the earlier in-process
plasmidkit Scorer numbers, which over-counted ORIs by treating overlapping
ColE1/pBR322/f1 motifs as separate features.

Outputs land in new bucket prefixes (rejection_sampling_v2/, best_of_16_v2/,
diversity_probe/) — existing v1 paths are untouched. Each cell records
SHA-256 of outputs.csv + the full-column SHA, with cross-cell and cross-
version distinctness checks in manifest.json.
Untrack private and cluster-specific paths (anyscale/, scripts/, experiments/,
sweeps/, AGENTS.md, ICML_REVISION_LOG.md, rebuttal.md, reviews.md). They stay
on disk and remain gitignored going forward.

Add MIT LICENSE (placeholder copyright holder until decided) and CITATION.cff
for ICML 2026.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Rename src/ to plasmidrl/ and update every `from src.*` to `from plasmidrl.*`
- Remove plasmidrl/main.py (Click CLI was unused)
- Bring the analysis2 QC pipeline into the package:
    plasmidrl/qc/blast_ori.py  (was analysis2/src/qc/qc_oriv_arg2.py)
    plasmidrl/qc/repeats.py    (was analysis2/src/qc/repeats2.py)
    plasmidrl/qc/filter.py     (was analysis2/src/qc/filter_qc_two_stage2.py)
    plasmidrl/qc/__init__.py   public run_strict_qc orchestrator
- Move reference assets:
    analysis2/assets/oriV_refs.fasta  → data/canonical_oris.fasta
    analysis2/assets/annotations/      → data/annotations/
- Drop the old Jupyter notebooks, the figures script, and the on-disk
  training parquets (now distributed via the public bucket)
- Move legacy paper figures into paper/figures/

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- plasmidrl.data: read-only loader for UCL-CSSB/PlasmidRL-ICML.
  load() / load_csv() / load_json() / load_fasta() / list_tree().
  Cached under ~/.cache/plasmidrl/ (override via PLASMIDRL_CACHE_DIR).
- plasmidrl.models: canonical Hugging Face paths (BASE / SFT / RL),
  the per-cell optimal temperatures used in every headline result,
  and a lazy load() that imports transformers on demand.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@McClain-Thiel McClain-Thiel merged commit 88c4362 into main May 13, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant