Skip to content

Add Compound proteinIdType and entity-agnostic grounding for metabolite networks#105

Merged
tonywu1999 merged 4 commits into
develfrom
MSstatsBioNet/work/20260528_compound-id-type
Jun 16, 2026
Merged

Add Compound proteinIdType and entity-agnostic grounding for metabolite networks#105
tonywu1999 merged 4 commits into
develfrom
MSstatsBioNet/work/20260528_compound-id-type

Conversation

@swaraj-neu

@swaraj-neu swaraj-neu commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Add Compound proteinIdType and entity-agnostic grounding columns:
Generalize the protein-only HgncId/HgncName contract into EntityNamespace/EntityId/EntityName grounded through Gilda, keeping multi-grounding as semicolon-joined aligned lists that fan out into the INDRA query. Gene-only annotations are skipped for compounds, and the new contract flows through annotateProteinInfoFromIndra, getSubnetworkFromIndra, and cytoscapeNetwork.

Motivation and Context

Please include relevant motivation and context of the problem along with a short summary of the solution.

Changes

Please provide a detailed bullet point list of your changes.

Testing

Please describe any unit tests you added or modified to verify your changes.

Checklist Before Requesting a Review

  • I have read the MSstats contributing guidelines
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules
  • Ran styler::style_pkg(transformers = styler::tidyverse_style(indent_by = 4))
  • Ran devtools::document()

Motivation and Context

MSstatsBioNet previously grounded entities using a protein-only contract based on HGNC identifiers and names (HgncId/HgncName). This limited downstream network construction/visualization to genes/proteins.
This PR generalizes the grounding model to be entity-agnostic by introducing unified grounding columns (EntityNamespace, EntityId, EntityName) that work across entity types, including compound/metabolite networks. Grounding is performed via Gilda and INDRA Cogex, with multi-candidate grounding supported by semicolon-joined, positionally aligned values.

Solution Summary (high level)

  • Extend proteinIdType to support "Metabolite" alongside existing protein-related modes.
  • Replace HGNC-specific columns with unified entity grounding columns (EntityNamespace/EntityId/EntityName) throughout the annotate → subnetwork → Cytoscape pipeline.
  • Use entity-appropriate grounding:
    • UniProt-based inputs → INDRA Cogex → HGNC entity groundings.
    • "Hgnc_Name" and "Metabolite" → Gilda grounding (with optional HGNC/human filtering for "Hgnc_Name").
  • Support multi-grounding by keeping aligned semicolon-joined vectors across Entity* columns, expanding into INDRA queries downstream.
  • Exclude gene-specific annotation logic for metabolite/compound entities (e.g., transcription factor/kinase/phosphatase flags become NA for "Metabolite").
  • Update Cytoscape labeling to use entityName (rather than hgncName).

Detailed Changes

  • R/annotateProteinInfoFromIndra.R

    • Validate proteinIdType against an expanded allowed set including "Metabolite".
    • Refactor the pipeline to populate unified grounding columns: EntityNamespace, EntityId, EntityName (replacing HGNC-specific population).
    • Clear UniprotId for "Hgnc_Name" and "Metabolite" inputs.
    • Centralize entity grounding:
      • UniProt-based inputs use INDRA Cogex mapping to set HGNC EntityNamespace/EntityId/EntityName.
      • "Hgnc_Name" and "Metabolite" use new Gilda-based grounding with aligned semicolon-joined multi-candidates.
    • Update transcription factor/kinase/phosphatase population to accept proteinIdType, and return early for "Metabolite" (leaving flags as NA).
  • R/utils_annotateProteinInfoFromIndra.R

    • Remove HGNC-name-specific Gilda helper (.callGetHgncIdsFromGildaApi).
    • Add generalized Gilda grounding helper: .callGroundEntitiesFromGildaApi(textInputs, keep_only = NULL, organisms = NULL)
      • Calls Gilda ground_multi
      • Parses per-input candidates into term$db (namespace), term$id (identifier), and term$entry_name (fallback to term$text)
      • Optionally filters candidates via keep_only
      • Returns named per-input grounding lists with ns/id/name vectors.
  • R/getSubnetworkFromIndra.R

    • Update roxygen documentation to reflect the new annotated schema based on EntityNamespace/EntityId/EntityName (semicolon-joined/aligned multi-groundings).
    • Change Cogex/INDRA plumbing to send EntityNamespace/EntityId into Cogex rather than HGNC IDs.
  • R/utils_getSubnetworkFromIndra.R

    • Shift INDRA subnetwork grounding/validation from HGNC-only to (EntityNamespace, EntityId) pair groundings.
    • Enforce input validation requiring EntityNamespace, EntityId, and EntityName.
    • Add .buildCogexGroundings(namespaces, ids, force_include_other = NULL) to:
      • Validate positional alignment for semicolon-joined lists
      • Fan-out multi-groundings into INDRA node groundings
      • Append/validate force_include_other entries using namespace-aware (ns,id) parsing
      • Deduplicate repeated (ns,id) pairs.
    • Refactor .callIndraCogexApi() signature to accept namespaces and ids (instead of HGNC ids).
    • Update filtering/mapping logic:
      • Drop rows with NA EntityId
      • Match force_include_other against semicolon-split (EntityNamespace, EntityId) pairs
      • Rework endpoint-to-protein mapping using namespace-aware helpers (e.g., .rowMatchesEndpoint)
      • Compute source/target UniProt IDs by namespace-aware matching against each row’s semicolon-split namespace/id sets.
    • Update node construction schema:
      • Output EntityName/EntityId as entityName/entityId in downstream nodes, including for synthetic nodes.
  • R/cytoscapeNetwork.R and R/utils_cytoscapeNetwork.R

    • Document and implement labeling changes:
      • Support displayLabelType = "entityName" instead of "hgncName"
      • In .buildElements(), select label column from entityName when present; otherwise fall back to id.
      • Prefer row$entityName for display_label (fallback to row$id).
  • Documentation / man pages / vignettes

    • Update Rd files to describe the new Entity* grounding contract and "Metabolite" behavior.
    • Add documentation pages for new internal/entity-grounding helpers:
      • .populateEntityInformationInDataFrame, .populateEntityInformationWithGilda, .populateEntityInformationWithIndraCogex
    • Remove documentation pages for HGNC-only internal helpers:
      • .populateHgncIdsInDataFrame, .populateHgncNamesInDataFrame
    • Update visualization examples to use entityName and displayLabelType = "entityName".

Unit Tests

  • Added

    • tests/testthat/test-multi-grounding.R
      • Covers .buildCogexGroundings fan-out from semicolon-joined (ns,id), handling force_include_other, deduplication, and error cases.
      • Covers namespace-aware .rowMatchesEndpoint logic.
      • Covers .addAdditionalMetadataToIndraEdge for reconstructing original protein IDs from multi-grounded endpoints (source/target).
      • Covers .constructNodesDataFrame node schema and alignment of entityName/entityId.
      • Covers Metabolite pathway for annotateProteinInfoFromIndra(..., "Metabolite"):
        • Unit tests with mocked Gilda grounding to verify aligned semicolon-joined EntityNamespace/EntityId/EntityName
        • Assertions that UniprotId and gene-only flags are NA for metabolites
        • A fixture/E2E-style test (skipped if unavailable) that mocks Gilda/INDRA calls and validates the end-to-end contract.
  • Modified

    • tests/testthat/test-annotateProteinInfoFromIndra.R
      • Update expectations to assert EntityNamespace/EntityId/EntityName instead of HGNC-only outputs.
      • Add/verify error on invalid proteinIdType.
    • tests/testthat/test-getSubnetworkFromIndra.R
      • Update mocks for .callIndraCogexApi(ns, ids, fio)
      • Update required-column and “400+ proteins” error tests to use EntityNamespace/EntityId.
    • tests/testthat/test-utils_annotateProteinInfoFromIndra.R
      • Replace HGNC-name helper tests with .callGroundEntitiesFromGildaApi tests (including keep_only = "HGNC" behavior and a mocked multi-namespace case).
    • tests/testthat/test-exportNetworkToHTML.R
      • Switch fixtures/assertions from hgncName to entityName and adjust expected displayLabelType.
    • tests/testthat/test-utils_cytoscapeNetwork.R
      • Update node fixtures and add checks for .buildElements(..., "entityName"), including entityName NA fallback behavior.
    • tests/testthat/test-utils_getSubnetworkFromIndra.R
      • Update force_include_other test setup to use EntityNamespace/EntityId.

Coding Guidelines Violations

No coding guideline violations were reported in the provided change summaries.

Generalize the protein-only HgncId/HgncName contract into EntityNamespace/EntityId/EntityName grounded through Gilda, keeping multi-grounding as semicolon-joined aligned lists that fan out into the INDRA query. Gene-only annotations are skipped for compounds, and the new contract flows through annotateProteinInfoFromIndra, getSubnetworkFromIndra, and cytoscapeNetwork.
@swaraj-neu swaraj-neu requested a review from tonywu1999 June 10, 2026 04:56
@swaraj-neu swaraj-neu self-assigned this Jun 10, 2026
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The PR replaces HGNC-specific identifier columns (HgncId, HgncName) with a generalized entity grounding model (EntityNamespace, EntityId, EntityName) across annotation, subnetwork, and visualization pipelines. It adds "Metabolite" as a supported proteinIdType using Gilda grounding, introduces semicolon-joined multi-candidate handling, and rewires the INDRA Cogex API calls to operate on namespace/id pairs.

Changes

Unified Entity Grounding Refactor

Layer / File(s) Summary
Gilda API: generalized grounding helper
R/utils_annotateProteinInfoFromIndra.R, tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R
Replaces .callGetHgncIdsFromGildaApi with .callGroundEntitiesFromGildaApi(textInputs, keep_only, organisms), which calls Gilda's ground_multi endpoint and returns per-input named lists of aligned ns/id/name vectors with optional filtering. Tests cover both live (HGNC-filtered) and mocked (mixed namespace) paths.
annotateProteinInfoFromIndra: unified Entity* pipeline
R/annotateProteinInfoFromIndra.R, tests/testthat/test-annotateProteinInfoFromIndra.R, tests/testthat/test-multi-grounding.R
Expands proteinIdType validation to include "Metabolite", clears UniprotId for name/metabolite inputs, and dispatches to .populateEntityInformationInDataFrame which routes to INDRA cogex (UniProt-based) or Gilda (name/metabolite). TF/kinase/phosphatase helpers gain proteinIdType and early-return NA for metabolites. Tests validate all new column contracts and Metabolite E2E behavior.
INDRA Cogex: namespace-aware grounding and subnetwork API
R/getSubnetworkFromIndra.R, R/utils_getSubnetworkFromIndra.R, tests/testthat/test-getSubnetworkFromIndra.R, tests/testthat/test-utils_getSubnetworkFromIndra.R
Introduces .buildCogexGroundings() to fan out semicolon-joined pairs into INDRA node groundings; refactors .callIndraCogexApi() to accept namespaces/ids instead of hgncIds; updates input validation to count unique (ns, id) pairs; rewrites .filterGetSubnetworkFromIndraInput() with namespace-aware exemption matching.
Namespace-aware edge metadata and node construction
R/utils_getSubnetworkFromIndra.R, tests/testthat/test-multi-grounding.R
Adds .rowMatchesEndpoint() for semicolon-split namespace-aware row matching; updates .addAdditionalMetadataToIndraEdge() to recover source/target proteins via namespace matching; updates .constructNodesDataFrame() to emit entityName/entityId columns instead of HgncName.
Cytoscape/HTML visualization: entityName label support
R/utils_cytoscapeNetwork.R, R/cytoscapeNetwork.R, tests/testthat/test-utils_cytoscapeNetwork.R, tests/testthat/test-exportNetworkToHTML.R, vignettes/...
Updates .buildElements() to select entityName as label_col and display_label source when displayLabelType = "entityName". All related tests, docs, and vignettes updated from hgncName to entityName/entityId.
Documentation and man page updates
man/annotateProteinInfoFromIndra.Rd, man/dot-populate*.Rd, man/getSubnetworkFromIndra.Rd, man/exportNetworkToHTML.Rd, man/previewNetworkInBrowser.Rd, man/cytoscapeNetwork.Rd
Adds new man pages for three populateEntityInformation* functions; removes populateHgncIdsInDataFrame and populateHgncNamesInDataFrame pages; updates signatures and argument descriptions across all affected public and internal functions.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant annotateProteinInfoFromIndra
  participant populateEntityInformationInDataFrame
  participant IndraCogex as INDRA Cogex API
  participant GildaAPI as Gilda ground_multi API
  participant getSubnetworkFromIndra
  participant buildCogexGroundings
  participant IndraCogexSubnet as INDRA Cogex Subnetwork API

  User->>annotateProteinInfoFromIndra: df, proteinIdType
  annotateProteinInfoFromIndra->>populateEntityInformationInDataFrame: dispatch by proteinIdType
  alt Uniprot / Uniprot_Mnemonic
    populateEntityInformationInDataFrame->>IndraCogex: UniprotId → HGNC id/name
    IndraCogex-->>populateEntityInformationInDataFrame: EntityNamespace=HGNC, EntityId, EntityName
  else Hgnc_Name
    populateEntityInformationInDataFrame->>GildaAPI: ground_multi(keep_only=HGNC)
    GildaAPI-->>populateEntityInformationInDataFrame: ns/id/name vectors (semicolon-joined if multi)
  else Metabolite
    populateEntityInformationInDataFrame->>GildaAPI: ground_multi(no filter)
    GildaAPI-->>populateEntityInformationInDataFrame: ns/id/name vectors (CHEBI etc.)
  end
  annotateProteinInfoFromIndra-->>User: df with EntityNamespace, EntityId, EntityName

  User->>getSubnetworkFromIndra: annotated df (EntityNamespace, EntityId)
  getSubnetworkFromIndra->>buildCogexGroundings: namespaces, ids (semicolon-split fan-out)
  buildCogexGroundings-->>getSubnetworkFromIndra: INDRA node groundings list
  getSubnetworkFromIndra->>IndraCogexSubnet: POST with groundings
  IndraCogexSubnet-->>getSubnetworkFromIndra: edges + node metadata
  getSubnetworkFromIndra-->>User: subnetwork with entityName/entityId nodes
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • Vitek-Lab/MSstatsBioNet#73: Directly overlaps with cytoscapeNetwork/.buildElements() label sourcing changes — both touch displayLabelType and node label selection logic.
  • Vitek-Lab/MSstatsBioNet#67: Both PRs modify the annotateProteinInfoFromIndra Gilda-based HGNC name grounding pathway, with this PR replacing the earlier .callGetHgncIdsFromGildaApi implementation.
  • Vitek-Lab/MSstatsBioNet#54: Both PRs modify .callIndraCogexApi and force_include_other identifier-mapping logic in utils_getSubnetworkFromIndra.

Suggested labels

Review effort 3/5

Poem

🐇 Hoppity-hop through the namespace maze,
Old HgncName lost in the haze!
EntityId, EntityNamespace — all aligned,
Metabolites grounded, semicolons refined.
Gilda and Cogex now share the stage,
A unified schema for every age! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description provides motivation/context and briefly describes the solution, but the 'Changes' and 'Testing' sections lack the required detailed bullet-point lists, and the checklist items are incomplete. Expand the 'Changes' section with detailed bullet points covering all modified files/functions, and provide a detailed 'Testing' section describing unit tests, integration tests, and coverage. Complete the pre-review checklist items.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main changes: adding support for Compound proteinIdType and implementing entity-agnostic grounding for metabolite networks, which aligns with the substantial refactoring evident in the raw summary.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch MSstatsBioNet/work/20260528_compound-id-type

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov-commenter

codecov-commenter commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 91.66667% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.17%. Comparing base (d1e6219) to head (eb69056).

Files with missing lines Patch % Lines
R/utils_getSubnetworkFromIndra.R 83.33% 12 Missing ⚠️
R/utils_annotateProteinInfoFromIndra.R 91.17% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #105      +/-   ##
==========================================
+ Coverage   75.35%   77.17%   +1.81%     
==========================================
  Files           9        9              
  Lines        1047     1139      +92     
==========================================
+ Hits          789      879      +90     
- Misses        258      260       +2     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@swaraj-neu

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Comment thread R/annotateProteinInfoFromIndra.R Outdated
Comment thread R/annotateProteinInfoFromIndra.R Outdated
Comment thread R/annotateProteinInfoFromIndra.R Outdated
Comment thread R/annotateProteinInfoFromIndra.R Outdated
Comment thread R/getSubnetworkFromIndra.R
Comment thread R/utils_annotateProteinInfoFromIndra.R Outdated
Comment thread R/utils_cytoscapeNetwork.R Outdated
Comment on lines +252 to +255
# `emitted_cpds` and `node_type = "compound"` below refer to Cytoscape
# grouping containers used to parent PTM satellite nodes around a protein.
# This Cytoscape "compound" concept is UNRELATED to the chemical
# `proteinIdType = "Compound"` analyte type in annotateProteinInfoFromIndra.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, let's use metabolite instead of compound as an enum for proteinIdType. Could you make this change? And then this comment could get removed.

Comment thread R/utils_getSubnetworkFromIndra.R Outdated
Comment thread R/utils_getSubnetworkFromIndra.R Outdated
…d gene identifiers; update documentation; remove deprecated functions

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
R/utils_annotateProteinInfoFromIndra.R (1)

304-315: ⚠️ Potential issue | 🟠 Major

Guard non-2xx Gilda responses before iterating parsed content.

If Gilda returns an error response (4xx/5xx), the current code still parses it and enters the loop at line 323, where it can fail accessing entry$term on malformed structures. Add an HTTP status check before processing the parsed response.

Suggested fix
     res <- tryCatch({
         response <- POST(
             apiUrl,
             body = requestBody,
             add_headers("Content-Type" = "application/json"),
             encode = "raw"
         )
-        content(response)
+        if (httr::http_error(response)) {
+            message("Gilda API returned HTTP ", httr::status_code(response))
+            return(NULL)
+        }
+        parsed <- content(response, as = "parsed", simplifyVector = FALSE)
+        if (!is.list(parsed)) {
+            message("Unexpected Gilda API response shape.")
+            return(NULL)
+        }
+        parsed
     }, error = function(e) {
         message("Error in API call: ", e)
         NULL
     })

Remember to add http_error and status_code to the @importFrom httr statement.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@R/utils_annotateProteinInfoFromIndra.R` around lines 304 - 315, The tryCatch
block in the code currently parses Gilda API responses without validating the
HTTP status code, allowing error responses (4xx/5xx) to be processed and passed
to the iteration loop that accesses entry$term. Add an HTTP status check using
status_code() function after the POST call but before calling content(response),
such that if the status code indicates an error, the function returns NULL
instead of attempting to parse and process the malformed response. Additionally,
update the `@importFrom` httr statement to include both http_error and status_code
functions.
R/utils_getSubnetworkFromIndra.R (1)

93-108: ⚠️ Potential issue | 🟠 Major

Add timeout and HTTP status handling to the Cogex API call.

POST() is currently executed without a timeout, and content() is consumed without checking response status. A transient network issue or 5xx response can hang or propagate unclear parse failures.

Suggested fix
-    res <- POST(
+    res <- POST(
         indraCogexUrl,
         body = groundings,
         add_headers("Content-Type" = "application/json"),
+        httr::timeout(30),
         encode = "raw"
     )
-    res <- content(res)
+    httr::stop_for_status(res)
+    res <- content(res)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@R/utils_getSubnetworkFromIndra.R` around lines 93 - 108, The
`.callIndraCogexApi` function makes a POST request without a timeout and
consumes the response without checking the HTTP status code, which can cause
hangs on network issues or unclear failures on server errors. Add a timeout
parameter to the POST() call (such as using the timeout() function from httr)
and check the response status code using status_code() before calling content().
If the status code indicates an error (not in the 2xx range), stop execution
with a meaningful error message that includes both the status code and response
content.
🧹 Nitpick comments (2)
tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R (1)

98-106: ⚡ Quick win

Mocking the function under test makes this assertion tautological.

Because .callGroundEntitiesFromGildaApi is replaced directly, the test does not validate request construction/filtering/parsing behavior. Mock lower-level HTTP boundaries instead so the function body executes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R` around lines 98 -
106, The test mocks `.callGroundEntitiesFromGildaApi` directly, which prevents
the actual function body from executing and makes the test unable to validate
real request construction, filtering, and parsing behavior. Instead of mocking
the function under test itself, mock the lower-level HTTP boundaries or API
calls that `.callGroundEntitiesFromGildaApi` uses internally. This allows the
function to execute its actual logic while controlling external dependencies,
enabling the test to properly validate the function's behavior.
tests/testthat/test-getSubnetworkFromIndra.R (1)

5-7: ⚡ Quick win

Assert forwarded ns/ids in the Cogex mock to lock in this contract.

The new signature is mocked but not asserted. Capturing and checking ns/ids values would make this test fail on any future regression that accidentally reverts to old identifiers.

Also applies to: 17-19

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/testthat/test-getSubnetworkFromIndra.R` around lines 5 - 7, The mocked
`callIndraCogexApi` function needs to include assertions that verify the `ns`
and `ids` parameters are receiving the expected values before returning the
mocked response. Add assertions within the mock function body to validate these
parameter values match what is expected in the new signature. This pattern
should be applied consistently across all similar mocked function definitions
that have been updated with the new signature to ensure any future regressions
are caught.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@R/annotateProteinInfoFromIndra.R`:
- Around line 64-67: The validation logic for proteinIdType does not explicitly
handle NA values, causing the condition at the validation check (where allowed
values are compared) to trigger a generic runtime error instead of the intended
validation message. Add an explicit NA check to the validation condition before
or alongside the existing length and membership checks (the `if` statement
containing the `!proteinIdType %in% allowed` test) so that NA values are caught
and produce the proper validation error message.

In `@R/utils_getSubnetworkFromIndra.R`:
- Around line 10-12: The validation check in the conditional statement that
starts at line 10 only verifies the presence of EntityId and EntityNamespace
columns, but downstream code unconditionally accesses additional columns:
Protein, adj.pvalue, log2FC, and EntityName. Expand the validation to check for
all required columns (EntityId, EntityNamespace, Protein, adj.pvalue, log2FC,
and EntityName) in a single comprehensive check, and update the error message to
clearly indicate all required column names so users receive an explicit
input-contract error rather than opaque subsetting errors later.

In `@tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R`:
- Around line 79-94: The test function testing .callGroundEntitiesFromGildaApi
makes unconditional live API calls to an external service, which can fail
unpredictably in CI environments. Wrap the test logic with a conditional check
using skip_if to gate the test behind an opt-in environment variable (for
example, check if a flag like RUN_LIVE_TESTS is set), allowing the test to be
skipped by default in CI but runnable manually when the environment variable is
set. Alternatively, refactor the test to mock the API responses instead of
making actual API calls while maintaining the same validation logic for the
returned data structure and content.

---

Outside diff comments:
In `@R/utils_annotateProteinInfoFromIndra.R`:
- Around line 304-315: The tryCatch block in the code currently parses Gilda API
responses without validating the HTTP status code, allowing error responses
(4xx/5xx) to be processed and passed to the iteration loop that accesses
entry$term. Add an HTTP status check using status_code() function after the POST
call but before calling content(response), such that if the status code
indicates an error, the function returns NULL instead of attempting to parse and
process the malformed response. Additionally, update the `@importFrom` httr
statement to include both http_error and status_code functions.

In `@R/utils_getSubnetworkFromIndra.R`:
- Around line 93-108: The `.callIndraCogexApi` function makes a POST request
without a timeout and consumes the response without checking the HTTP status
code, which can cause hangs on network issues or unclear failures on server
errors. Add a timeout parameter to the POST() call (such as using the timeout()
function from httr) and check the response status code using status_code()
before calling content(). If the status code indicates an error (not in the 2xx
range), stop execution with a meaningful error message that includes both the
status code and response content.

---

Nitpick comments:
In `@tests/testthat/test-getSubnetworkFromIndra.R`:
- Around line 5-7: The mocked `callIndraCogexApi` function needs to include
assertions that verify the `ns` and `ids` parameters are receiving the expected
values before returning the mocked response. Add assertions within the mock
function body to validate these parameter values match what is expected in the
new signature. This pattern should be applied consistently across all similar
mocked function definitions that have been updated with the new signature to
ensure any future regressions are caught.

In `@tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R`:
- Around line 98-106: The test mocks `.callGroundEntitiesFromGildaApi` directly,
which prevents the actual function body from executing and makes the test unable
to validate real request construction, filtering, and parsing behavior. Instead
of mocking the function under test itself, mock the lower-level HTTP boundaries
or API calls that `.callGroundEntitiesFromGildaApi` uses internally. This allows
the function to execute its actual logic while controlling external
dependencies, enabling the test to properly validate the function's behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b725c519-c80d-49e5-b2b4-7972d7a3b56a

📥 Commits

Reviewing files that changed from the base of the PR and between d1e6219 and 67268cc.

⛔ Files ignored due to path filters (2)
  • inst/extdata/groupComparisonModel.csv is excluded by !**/*.csv
  • inst/extdata/groupComparisonModel_compound.csv is excluded by !**/*.csv
📒 Files selected for processing (31)
  • R/annotateProteinInfoFromIndra.R
  • R/cytoscapeNetwork.R
  • R/getSubnetworkFromIndra.R
  • R/utils_annotateProteinInfoFromIndra.R
  • R/utils_cytoscapeNetwork.R
  • R/utils_getSubnetworkFromIndra.R
  • man/annotateProteinInfoFromIndra.Rd
  • man/cytoscapeNetwork.Rd
  • man/dot-populateEntityInformationInDataFrame.Rd
  • man/dot-populateEntityInformationWithGilda.Rd
  • man/dot-populateEntityInformationWithIndraCogex.Rd
  • man/dot-populateHgncIdsInDataFrame.Rd
  • man/dot-populateHgncNamesInDataFrame.Rd
  • man/dot-populateKinaseInfoInDataFrame.Rd
  • man/dot-populatePhophataseInfoInDataFrame.Rd
  • man/dot-populateTranscriptionFactorInfoInDataFrame.Rd
  • man/dot-populateUniprotIdsInDataFrame.Rd
  • man/dot-validateAnnotateProteinInfoFromIndraInput.Rd
  • man/exportNetworkToHTML.Rd
  • man/getSubnetworkFromIndra.Rd
  • man/previewNetworkInBrowser.Rd
  • tests/testthat/test-annotateProteinInfoFromIndra.R
  • tests/testthat/test-exportNetworkToHTML.R
  • tests/testthat/test-getSubnetworkFromIndra.R
  • tests/testthat/test-multi-grounding.R
  • tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R
  • tests/testthat/test-utils_cytoscapeNetwork.R
  • tests/testthat/test-utils_getSubnetworkFromIndra.R
  • vignettes/Cytoscape-Visualization.Rmd
  • vignettes/MSstatsBioNet.Rmd
  • vignettes/PTM-Analysis.Rmd
💤 Files with no reviewable changes (2)
  • man/dot-populateHgncNamesInDataFrame.Rd
  • man/dot-populateHgncIdsInDataFrame.Rd

Comment thread R/annotateProteinInfoFromIndra.R
Comment thread R/utils_getSubnetworkFromIndra.R Outdated
Comment thread tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R
@swaraj-neu swaraj-neu requested a review from tonywu1999 June 15, 2026 23:55

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R (2)

103-107: 💤 Low value

Consider using metabolite-appropriate mock data for clarity.

The mock returns "KIT" as one of the names for the "glucose" grounding. While structurally valid for testing multi-grounding, "KIT" is a gene name and semantically unrelated to glucose, which may confuse maintainers reading the test.

Consider using metabolite-appropriate values (e.g., "D-Glucose" or "Glucose" for both entries) to improve test clarity.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R` around lines 103
- 107, The mock data for the glucose entry contains "KIT" as a name value, which
is a gene name and semantically inappropriate for a metabolite. In the glucose
list definition, replace the name value "KIT" with a metabolite-appropriate name
such as "D-Glucose" or "Glucose" to improve test clarity and prevent confusion
for maintainers. Ensure both name entries for the multi-grounding (corresponding
to the MESH and CHEBI namespaces) use metabolite-appropriate values.

99-116: 💤 Low value

Add explicit length alignment checks for consistency.

The live test at lines 91-92 explicitly validates that ns, id, and name vectors have equal length—a key contract enforced by the downstream consumer (.populateEntityInformationWithGilda asserts length equality). Adding the same check here would make test coverage more consistent and defensive.

♻️ Suggested addition
     expect_equal(result[["EGFR"]]$ns, "HGNC")
     expect_equal(result[["glucose"]]$ns, c("MESH", "CHEBI"))
+    expect_equal(length(result[["glucose"]]$ns), length(result[["glucose"]]$id))
+    expect_equal(length(result[["glucose"]]$ns), length(result[["glucose"]]$name))
     expect_equal(result[["glucose"]]$id, c("3815", "17234"))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R` around lines 99 -
116, Add explicit length alignment checks to the test function
`.callGroundEntitiesFromGildaApi keeps non-HGNC namespaces when keep_only is
NULL (mocked)` to validate that the `ns`, `id`, and `name` vectors have equal
lengths, matching the pattern used in the live test. After the existing
expect_equal assertions, add checks to verify that for the "glucose" entry, all
three vectors have the same length (for example, both `ns` and `id` should have
length 2). This ensures consistency with the test coverage expectations and
enforces the key contract that downstream consumers like
`.populateEntityInformationWithGilda` depend on.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R`:
- Around line 103-107: The mock data for the glucose entry contains "KIT" as a
name value, which is a gene name and semantically inappropriate for a
metabolite. In the glucose list definition, replace the name value "KIT" with a
metabolite-appropriate name such as "D-Glucose" or "Glucose" to improve test
clarity and prevent confusion for maintainers. Ensure both name entries for the
multi-grounding (corresponding to the MESH and CHEBI namespaces) use
metabolite-appropriate values.
- Around line 99-116: Add explicit length alignment checks to the test function
`.callGroundEntitiesFromGildaApi keeps non-HGNC namespaces when keep_only is
NULL (mocked)` to validate that the `ns`, `id`, and `name` vectors have equal
lengths, matching the pattern used in the live test. After the existing
expect_equal assertions, add checks to verify that for the "glucose" entry, all
three vectors have the same length (for example, both `ns` and `id` should have
length 2). This ensures consistency with the test coverage expectations and
enforces the key contract that downstream consumers like
`.populateEntityInformationWithGilda` depend on.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2c010ce9-ab68-452a-a69d-1a66d86488f1

📥 Commits

Reviewing files that changed from the base of the PR and between 67268cc and eb69056.

📒 Files selected for processing (6)
  • R/annotateProteinInfoFromIndra.R
  • R/utils_getSubnetworkFromIndra.R
  • man/annotateProteinInfoFromIndra.Rd
  • tests/testthat/test-getSubnetworkFromIndra.R
  • tests/testthat/test-multi-grounding.R
  • tests/testthat/test-utils_annotateProteinInfoFromIndra.R.R
✅ Files skipped from review due to trivial changes (1)
  • man/annotateProteinInfoFromIndra.Rd
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/testthat/test-getSubnetworkFromIndra.R
  • tests/testthat/test-multi-grounding.R
  • R/utils_getSubnetworkFromIndra.R
  • R/annotateProteinInfoFromIndra.R

@tonywu1999 tonywu1999 merged commit e274ff2 into devel Jun 16, 2026
4 checks passed
@tonywu1999 tonywu1999 deleted the MSstatsBioNet/work/20260528_compound-id-type branch June 16, 2026 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants