Skip to content

Fix(Spectronaut): Enable annotation to be added to input#18

Merged
tonywu1999 merged 3 commits into
develfrom
MSstatsBig/work/20260526_bigspectronaut_annotation_param
May 26, 2026
Merged

Fix(Spectronaut): Enable annotation to be added to input#18
tonywu1999 merged 3 commits into
develfrom
MSstatsBig/work/20260526_bigspectronaut_annotation_param

Conversation

@tonywu1999

@tonywu1999 tonywu1999 commented May 26, 2026

Copy link
Copy Markdown
Contributor
  • Added annotation = NULL parameter to bigSpectronauttoMSstatsFormat (second positional arg mirroring bigDIANNtoMSstatsFormat from Fix annotationt issue #16).
  • When supplied, the converter merges the annotation onto the output via MSstatsAddAnnotationBig, overriding any Condition / BioReplicate columns that came from R.Condition / R.Replicate.
  • Required for paired designs and other experimental layouts that Spectronaut's own annotation cannot express.
  • Added override test under tests/testthat/test-converters.R.

Motivation and Context

Spectronaut's native annotation capabilities cannot express all experimental layouts, particularly paired designs and other complex design patterns. This PR extends bigSpectronauttoMSstatsFormat to accept an optional annotation parameter, mirroring functionality added to bigDIANNtoMSstatsFormat in PR #16. When provided, the annotation data overrides the Condition and BioReplicate columns derived from Spectronaut's embedded R.Condition and R.Replicate fields, enabling users to specify custom experimental designs.

Changes

  • Function signature updated: bigSpectronauttoMSstatsFormat now accepts annotation = NULL as the second positional parameter (before output_file_name)

  • Annotation processing logic: When annotation is supplied, the function merges it onto the output using MSstatsAddAnnotationBig, overriding Spectronaut-derived Condition/BioReplicate values

  • File system handling: For the arrow backend, the function unlinks the existing output file and writes the updated annotated dataset back to CSV before returning

  • Documentation updates:

    • Function usage documentation updated to reflect new parameter signature
    • New @param annotation roxygen block documents the annotation parameter and its override behavior
    • Roxygen examples expanded to demonstrate paired design override use case
    • Function call updated to use explicit named arguments in MSstatsPreprocessBig invocation
  • Roxygen2 configuration: DESCRIPTION file updated with Config/roxygen2/version: 8.0.0 and RoxygenNote removed, aligning package metadata with roxygen2 version 8.0.0

  • Internal helper documentation: New roxygen2-generated documentation page added for .prefixedPath() internal helper function

Unit Tests

  • New test in tests/testthat/test-converters.R: Added test verifying bigSpectronauttoMSstatsFormat correctly overrides Condition and BioReplicate using provided annotation data
    • Stubs reduceBigSpectronaut to emit sentinel values for Condition/BioReplicate
    • Runs converter with annotation parameter using backend = "arrow" and max_feature_count = 1
    • Verifies output contains annotation-provided values rather than sentinel values
    • Includes cleanup of output directories and files

Review Change Stack

* Added annotation = NULL parameter to bigSpectronauttoMSstatsFormat
  (positional arg #2, mirroring bigDIANNtoMSstatsFormat from #16).
* When supplied, the converter merges the annotation onto the
  output via MSstatsAddAnnotationBig, overriding any Condition /
  BioReplicate columns that came from R.Condition / R.Replicate.
* Required for paired designs and other experimental layouts that
  Spectronaut's own annotation cannot express.
* Added override test under tests/testthat/test-converters.R.

See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md

Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented May 26, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

This PR adds an optional annotation parameter to bigSpectronauttoMSstatsFormat to allow users to override Spectronaut-derived sample metadata (Condition, BioReplicate). The implementation includes roxygen2 configuration alignment, core function logic with annotation merging, updated public documentation, test coverage, and internal helper documentation.

Changes

Annotation Parameter for Spectronaut Converter

Layer / File(s) Summary
Roxygen2 Configuration Update
DESCRIPTION
Update package description to align roxygen2 version from 7.3.3 to 8.0.0 by removing RoxygenNote and adding Config/roxygen2/version: 8.0.0.
Annotation Parameter Implementation
R/converters.R
Add annotation = NULL parameter to function signature, document via @param annotation roxygen block, refactor MSstatsPreprocessBig call with explicit named arguments, and implement annotation merging via MSstatsAddAnnotationBig with Arrow backend file write-back logic.
Public Function Documentation
man/bigSpectronauttoMSstatsFormat.Rd
Document annotation = NULL parameter in function usage, add detailed argument description covering override behavior and paired design guidance, and expand examples to demonstrate annotation override workflow.
Annotation Override Test
tests/testthat/test-converters.R
Add test verifying that provided annotation correctly overrides Spectronaut-derived Condition and BioReplicate values in converter output.
Internal Helper Documentation
man/dot-prefixedPath.Rd
Add roxygen2-generated man page for internal .prefixedPath(prefix, path) helper documenting basename-only prefixing behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Vitek-Lab/MSstatsBig#16: Modifies MSstatsAddAnnotationBig annotation join/overlap handling in R/converters.R, which is invoked by the annotation merging logic in this PR.

Suggested reviewers

  • Rudhik1904

Poem

🐰 A rabbit hops through data clean,
Annotation fields now intervene,
Spectronaut's values override with grace,
Tests ensure each condition's place,
Roxygen aligned—a tidy embrace! 📦

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description lacks structured detail and omits required sections from the template. Expand the description to match the template: add detailed motivation/context explaining why annotation support is needed, provide comprehensive bullet points for all changes (including DESCRIPTION file updates), document the test additions, and complete the contributor checklist.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding annotation parameter support to bigSpectronauttoMSstatsFormat function.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch MSstatsBig/work/20260526_bigspectronaut_annotation_param

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

* Slotted annotation = NULL just before connection = NULL instead
  of at position #2, so the pre-existing positional signature
  (input_file, output_file_name, backend, intensity, ...) keeps
  working for any external positional callers.
* This intentionally diverges from bigDIANNtoMSstatsFormat (#16),
  which puts annotation at position #2. Backward compatibility
  was prioritized for the Spectronaut converter because it had a
  longer pre-annotation life. DIANN can be re-flowed separately
  if consistency is needed later.
* Restored the simpler positional example call (no longer needs
  named-arg workaround that the position-#2 signature forced).

See MSstats-ai/todos/active/TODO-MSBig-20260526_bigspectronaut_annotation_param.md

Co-Authored-By: Claude <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/testthat/test-converters.R (1)

134-141: ⚡ Quick win

Assert persisted output also reflects annotation override

This test checks the returned object, but not the rewritten output_file_name artifact. Since Lines 195-196 introduce persistence behavior, add a reopen/assert step to lock that contract.

Small test extension
   expect_false(any(result$Condition == "FROM_SPECTRONAUT"))
   expect_false(any(result$BioReplicate == 999))
+
+  persisted <- dplyr::collect(arrow::open_dataset(output_file, format = "csv"))
+  persisted <- persisted[order(persisted$Run), ]
+  expect_equal(persisted$Condition, c("ctrl", "treat"))
+  expect_equal(persisted$BioReplicate, c(7L, 8L))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/testthat/test-converters.R` around lines 134 - 141, The test currently
asserts the in-memory result but not the persisted artifact; after the existing
checks (using result and output_file), reopen the written artifact at
output_file into a new variable (e.g., persisted_result) and repeat the same
assertions: expect_equal on Condition and BioReplicate and expect_false for
"FROM_SPECTRONAUT" and 999 to ensure the persisted output reflects the
annotation override introduced by the code that writes output_file_name; use the
same column names (Condition, BioReplicate) and then keep the existing cleanup
unmodified.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@R/converters.R`:
- Around line 157-160: The function bigSpectronauttoMSstatsFormat changed its
parameter order and now breaks positional callers; restore compatibility by
either moving the annotation parameter after backend in the function signature
or adding a small shim at the start of bigSpectronauttoMSstatsFormat that
detects when callers passed the old positional form (i.e., when backend is
missing but annotation contains a backend-like value) and swaps arguments
accordingly: check if missing(backend) && !is.null(annotation) && annotation
%in% c(...) or matches the expected backend type/values, then assign backend <-
annotation; annotation <- NULL (or shift the third argument into backend and
fourth into annotation if present); update any internal references to use the
corrected variables.
- Around line 192-197: The current code unlinks output_file_name before calling
arrow::write_dataset (when backend == "arrow"), making overwrites non-atomic;
change the logic in the block that calls MSstatsAddAnnotationBig and
arrow::write_dataset so you write the Arrow dataset to a temporary path (e.g.,
output_file_name_tmp or tempdir()/basename(...)), then after write_dataset
succeeds replace the original output atomically by renaming/moving the temp to
output_file_name (using file.rename or a safe atomic swap), and only unlink the
original if needed on failure; ensure you still handle recursive and force
semantics and keep MSstatsAddAnnotationBig and backend checks intact.

---

Nitpick comments:
In `@tests/testthat/test-converters.R`:
- Around line 134-141: The test currently asserts the in-memory result but not
the persisted artifact; after the existing checks (using result and
output_file), reopen the written artifact at output_file into a new variable
(e.g., persisted_result) and repeat the same assertions: expect_equal on
Condition and BioReplicate and expect_false for "FROM_SPECTRONAUT" and 999 to
ensure the persisted output reflects the annotation override introduced by the
code that writes output_file_name; use the same column names (Condition,
BioReplicate) and then keep the existing cleanup unmodified.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 37f6107a-df08-41b4-88c4-759bd957da58

📥 Commits

Reviewing files that changed from the base of the PR and between a43b90b and 11d4d10.

📒 Files selected for processing (5)
  • DESCRIPTION
  • R/converters.R
  • man/bigSpectronauttoMSstatsFormat.Rd
  • man/dot-prefixedPath.Rd
  • tests/testthat/test-converters.R

Comment thread R/converters.R Outdated
Comment thread R/converters.R
@tonywu1999 tonywu1999 merged commit 5bb9b4f into devel May 26, 2026
2 checks passed
@tonywu1999 tonywu1999 deleted the MSstatsBig/work/20260526_bigspectronaut_annotation_param branch May 26, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant