evo2 SAE eval (2/2): probing harness + CLI (on #1630)#1636
evo2 SAE eval (2/2): probing harness + CLI (on #1630)#1636polinabinder1 wants to merge 3 commits into
Conversation
The 'how to run it' half: probe.py (extract/auroc/linear/codon-aa/euk-f1/domain-eval/ annotate CLI), evo2_buffer (Evo2 -> ActivationBuffer), probe_loss_recovered (fidelity). Imports the label producers from the base PR + sae.eval.probing from #1629. Signed-off-by: Polina Binder <pbinder@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThree new scripts implement a complete Evo2 sparse autoencoder interpretability suite. ChangesEvo2 SAE Interpretability Probing
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
cmd_auroc/cmd_linear/cmd_annotate each repeated 'load buffer + resolve names (filtered to buffer, default all)'. One _load(a) -> (buf, names) helper. CPU annotate smoke still green. Signed-off-by: Polina Binder <pbinder@nvidia.com>
|
@coderabbitai review |
✅ Action performedReview finished.
|
|
@coderabbitai review |
✅ Action performedReview finished.
|
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/evo2_buffer.py (1)
37-52: ⚡ Quick winAdd Google-style docstrings instead of suppressing D103 in
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/evo2_buffer.pyandbionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/probe.py.The shared root cause is the same in both files: new public helpers and CLI entry points are being introduced with
# noqa: D103(or no docstring) instead of repo-required docstrings. Please document the FASTA/sampling helpers inevo2_buffer.pyand the public CLI commands inprobe.pywith Google-style docstrings rather than suppressing the lint rule. As per coding guidelines,**/*.py: Use Google-style docstrings (pydocstyle convention) in Python code and Use Ruff for Python linting and formatting with Google-style docstrings.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/evo2_buffer.py` around lines 37 - 52, Replace the "# noqa: D103" suppressions by adding Google-style docstrings for the public helpers and CLI entry points: add a short summary, Args, Returns, and Examples where appropriate to the read_fasta and sample_sequences functions in evo2_buffer.py (document parameters like path, fasta, max_tokens, seq_len, kingdoms, seed and what is yielded/returned), and likewise add Google-style docstrings to the public CLI functions/commands in probe.py (document command purpose, parameters/flags and exit behavior); ensure docstrings follow pydocstyle/Google conventions so Ruff/linting passes.Source: Coding guidelines
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/probe_loss_recovered.py (1)
51-85: ⚡ Quick winReplace the
D10xsuppressions with actual Google-style docstrings.This new file opts out of pydocstyle for
SAEWrap,L26Hook, andmain()instead of documenting the new CLI surface and hook behavior. Adding concise Google-style docstrings here is a small cleanup that keeps the script aligned with the repository standard.As per coding guidelines,
**/*.py: "Ensure all Python files follow Google-style docstrings (pydocstyle convention)" and "Use Ruff for Python linting and formatting with Google-style docstrings".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/probe_loss_recovered.py` around lines 51 - 85, Add Google-style docstrings for SAEWrap, L26Hook, and main() instead of suppressing pydocstyle (remove the noqa D10x markers). For SAEWrap, document the class purpose, the constructor parameter sae (type and expected interface: encode, decoder, pre_bias, optional normalize_input) and describe forward(x) return values (recon, codes) and normalization behavior. For L26Hook, document the class purpose, the attributes mode/override/captured, valid mode values ("off", "capture", "replace"), and __call__ behavior (how it captures or replaces the first tensor in output and dtype handling). For main(), add a concise docstring describing the CLI surface and what the script does when executed (setup, hooking behavior, and overall flow). Ensure docstrings follow Google style for one-line summary, optional Args/Returns sections where useful.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/probe_loss_recovered.py`:
- Around line 107-115: The code can proceed with an empty `batches` list and
produce misleading perfect scores; add a fail-fast guard after building
`batches` (after the loop that calls `sample_sequences`, `engine.tokenize`, and
appends to `batches`) that checks `if not batches:` and raises a clear exception
(e.g., ValueError) explaining no evaluable sequences were produced (mention
`--n-seqs`, filtered FASTA, or token length <= 4) so the evaluator (which
computes `ce_original`, `ce_sae`, `ce_zero` and `loss_recovered`) never runs on
empty data.
In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/probe.py`:
- Around line 343-351: The --auroc-device argument in _add_model_args is
hardcoded to "cuda:1", causing failures on single-GPU hosts; change its default
to match the primary device (e.g., "cuda:0") or make it None and resolve to the
same device as --device at runtime; update the same hardcoded default at the
other occurrence (lines referenced in the review) so all model-backed buffers
use the primary device by default (reference: _add_model_args and the
--auroc-device argument).
- Around line 19-27: The CLI help lists a "loss-recovered" subcommand but main()
never registers it; update main() to add a subparser for "loss-recovered" (using
the existing ArgumentParser.add_subparsers usage), set its help text to match
the help block, and wire it to the actual handler (e.g.,
set_defaults(func=probe_loss_recovered.main) or call the probe_loss_recovered
entrypoint) so invoking "probe.py loss-recovered" runs the intended logic; also
audit the similar missing registrations noted around the 354-405 region and
register any other commands mentioned in the top help that lack corresponding
subparsers.
- Around line 86-90: The _load function currently silently filters unknown
labels; change it to validate requested labels and fail fast: after loading buf
via ActivationBuffer.load(a.acts) and parsing labels from a.labels or
buf.label_names, compute missing = [t for t in labels if t not in buf.name_idx]
and if missing is non-empty raise a ValueError (or SystemExit) with a clear
message listing the unknown label names and available label names; otherwise
return buf and the requested labels in the original order (not filtered).
- Around line 114-120: The current guard skips only all-zero test labels but not
all-one test labels, causing best_single_train_test and auroc_vec to receive
single-class targets; update the conditional that checks ytr and yte so it also
treats an all-ones yte as invalid (e.g., check yte.sum() in (0, len(yte)) or
equivalent) and set out[n] to (nan,nan) in that case; modify the logic around
ytr, yte, the existing if-block, and the block that calls fit_logreg,
best_single_train_test and auroc_vec so both all-negative and all-positive test
folds are skipped.
- Around line 159-165: The codon→aa probe is leaking test info because Xz is
standardized with full-dataset mean/std; compute mean/std from the training
split and apply that normalization to both train and test instead. Specifically,
after calling split_indices(...) (tr, te), compute mu and sigma from X[tr] (use
X[tr].mean(0) and X[tr].std(0)+1e-6), then set Xz = (X - mu)/sigma before
calling decode_eval and fit_softmax; also ensure trn is chosen as the
intersection of tr with the non-hidx indices (instead of using the full-dataset
non-hidx mask) so fit_softmax(Xz[trn], ...) only uses training examples.
---
Nitpick comments:
In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/evo2_buffer.py`:
- Around line 37-52: Replace the "# noqa: D103" suppressions by adding
Google-style docstrings for the public helpers and CLI entry points: add a short
summary, Args, Returns, and Examples where appropriate to the read_fasta and
sample_sequences functions in evo2_buffer.py (document parameters like path,
fasta, max_tokens, seq_len, kingdoms, seed and what is yielded/returned), and
likewise add Google-style docstrings to the public CLI functions/commands in
probe.py (document command purpose, parameters/flags and exit behavior); ensure
docstrings follow pydocstyle/Google conventions so Ruff/linting passes.
In
`@bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/probe_loss_recovered.py`:
- Around line 51-85: Add Google-style docstrings for SAEWrap, L26Hook, and
main() instead of suppressing pydocstyle (remove the noqa D10x markers). For
SAEWrap, document the class purpose, the constructor parameter sae (type and
expected interface: encode, decoder, pre_bias, optional normalize_input) and
describe forward(x) return values (recon, codes) and normalization behavior. For
L26Hook, document the class purpose, the attributes mode/override/captured,
valid mode values ("off", "capture", "replace"), and __call__ behavior (how it
captures or replaces the first tensor in output and dtype handling). For main(),
add a concise docstring describing the CLI surface and what the script does when
executed (setup, hooking behavior, and overall flow). Ensure docstrings follow
Google style for one-line summary, optional Args/Returns sections where useful.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5bc44345-d335-4a1a-b937-a1042492fd94
📒 Files selected for processing (3)
bionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/evo2_buffer.pybionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/probe.pybionemo-recipes/interpretability/sparse_autoencoders/recipes/evo2/scripts/probe_loss_recovered.py
…standardization, robustness - cmd_auroc/_eval_matrix: skip all-positive test folds (AUROC undefined), not just all-negative - codon-aa: standardize on the train split only (was leaking test-set mean/std) - _load: reject unknown --labels explicitly instead of silently dropping them - --auroc-device now defaults to --device (was hardcoded cuda:1; broke single-GPU) - probe_loss_recovered: fail fast when no evaluable sequences were built - help: loss-recovered is a separate script, not a subcommand Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Polina Binder <pbinder@nvidia.com>
Summary
The probing harness / CLI ("how to run it"), stacked on the label producers (#1630).
Contents
probe.py—extract/auroc/linear/codon-aa/euk-f1/domain-eval/annotateevo2_buffer.py— Evo2 engine →ActivationBufferprobe_loss_recovered.py— fidelity via sharedsae.eval.loss_recoveredHow to run (
$CKPT=mbridge dir,$SAE=trained SAE .pt)Imports labelers (#1630) + scoring/annotate (#1629); needs the
evo2_saeengine (#1622) importable in the env.Summary by CodeRabbit
Expected output
extract→ writesbuf.npz(codes[N_tokens, n_features]+ per-token labels); prints the sequence/token counts.auroc→ tablelabel %pos best AUROC feature. Expect strong detectors on top —base_A/C/G/T≈ 0.95–1.0 (a feature firing on one nucleotide separates it cleanly), andmotif_ATG/cds_codingin the 0.85–0.95 band for the ~5 monosemantic features found at 7B/L26.annotate→"[annotate] K features labeled (AUROC ≥ 0.85) over L concepts → parquet"+{feature_id, label, auroc, …}.domain-eval→ per-conceptdomF1 / AUROC / %pos; exon/cds should beat the shuffle null.Measured (7B/L26): variance-explained ~89.5%, dead latents ~21%, ~5 features with motif AUROC > 0.85. Base-detector AUROC is expected near 1.0 but not yet measured — extract→auroc run pending the GPU env.