Skip to content

#1142 Use Arranger SearchClient#1145

Open
demariadaniel wants to merge 52 commits intodevelopfrom
1142/feat-support-opensearch
Open

#1142 Use Arranger SearchClient#1145
demariadaniel wants to merge 52 commits intodevelopfrom
1142/feat-support-opensearch

Conversation

@demariadaniel
Copy link
Copy Markdown
Contributor

@demariadaniel demariadaniel commented Mar 9, 2026

Updates HCMI application to run using Arranger SearchClient with OpenSearch

  • Replaces use of elastic-search client with Arranger Server SearchClient
  • Updates Mongoose client
  • Removes Mongoosastic library and uses explicit client indexing
  • Fixes bugs & cleanup related to legacy code
  • Updates Lodash patch version
  • Adds explicit handling for 'last-updated' index

@demariadaniel demariadaniel self-assigned this Mar 9, 2026
@demariadaniel demariadaniel marked this pull request as draft March 9, 2026 19:38
Comment thread cms/src/services/searchClient/publish.js Outdated
@demariadaniel demariadaniel requested a review from mistryrn April 7, 2026 18:19
@demariadaniel demariadaniel marked this pull request as ready for review April 7, 2026 18:19
@demariadaniel demariadaniel changed the title WIP: #1142 Use Arranger SearchClient #1142 Use Arranger SearchClient Apr 7, 2026
Comment thread ui/src/theme/index.js
const growthChartPalette = ['#ef5350', '#80deea', '#3949ab'];

const multipleModelsChartPalette = ['#d4e157', '#0277bd'];
const multipleModelsChartPalette = ['#d4e157', '#0277bd', '#f06292'];
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a 3rd value for differentiating 'No Data' from 'No'

Copy link
Copy Markdown
Contributor

@mistryrn mistryrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to complete my local testing once the docker-compose change we discussed on Slack is implemented (adding OpenSearch), but I've reviewed the code changes and they look great! Just a few minor change suggestions and cleanup/documentation asks. Other than this (and pending local testing), I think we're good to go 🙂

Awesome job, Dan! 🎉

Comment thread cms/src/services/search-client/client.js
Comment on lines +24 to +41
if (value && typeof value === 'object') {
if (Array.isArray(value)) {
const firstEntry = value[0];
const cleanedVals =
firstEntry && typeof firstEntry === 'object'
? value.map((val) => {
const cleanedValue = _.omit(val, mongoKeys);
return cleanedValue;
})
: value;
cleanedDoc[key] = cleanedVals;
} else if (value instanceof mongoose.Types.ObjectId) {
cleanedDoc[key] = value.toString();
} else if (!(value instanceof Date)) {
const cleanedValue = _.omit(value, mongoKeys);
cleanedDoc[key] = cleanedValue;
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working all this logic out, it must have been quite a bit of trial and error!

For future developers' reference, would you mind updating the TSDoc comment above this function to describe why we're cleaning/handling objects, arrays, mongoose ObjectIds, and Dates differently? Either that, or put some inline comments above each conditional explaining the purpose/logic, or both!

This is definitely the kind of thing that, if documented well, will be greatly appreciated by whoever looks at this months from now 🙂

Copy link
Copy Markdown
Contributor Author

@demariadaniel demariadaniel Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments added, let me know if you would revise these

Comment thread cms/src/services/search-client/publish.js Outdated
Comment thread scripts/utils/esUtils.js
import variantsIndexConfig from '../../elasticsearch/variantsIndex.json' with { type: "json" };
const modelsIndexName = process.env.ES_INDEX || pm2.ES_INDEX || 'hcmi';
const esHost = process.env.ES_HOST || `${pm2.ES_HOST}:${pm2.ES_PORT}`;
const updateIndexName = process.env.ES_UPDATE_INDEX || pm2.ES_UPDATE_INDEX || 'hcmi-update';
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch explicitly handling the last updated index 👍

Comment thread pnpm-workspace.yaml Outdated
Copy link
Copy Markdown
Contributor

@mistryrn mistryrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small suggestions for the docker-compose file 🙂

Comment thread docker-compose.yml Outdated
Comment thread docker-compose.yml Outdated
Copy link
Copy Markdown
Contributor

@mistryrn mistryrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found one bug with the republish script in my local testing: looks like one of the deleted ES schema files in /cms was used there. Hopefully it'll be a quick fix to get that working again 🙏

Also suggested a minor logs cleanup.

Comment thread cms/src/routes/sync-data.js Outdated
Comment thread scripts/utils/republishUtils.js Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to my previous comment: looks like the republish script relied on this export as a dependency. We may need to restore it or use an alternative to get that working again.

Copy link
Copy Markdown
Contributor

@mistryrn mistryrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the feedback so far! I encountered a few more issues in my local testing today:

  • (1) The "Updated" column on the /admin CMS table and the "Updated" timestamp shown on the edit model view don't seem to be getting updated any more when saving or publishing a model. This isn't a blocker for testing on the Dev server, but is something we'll need to fix before we go to Prod.
Image Image
  • (2) I'm encountering some data/index-related issues when loading the public-facing UI pages. These might be a blocker for testing on the Dev server, as we might encounter these when we go to test. I know you've fixed similar issues on your local already, so I'm going to leave some instructions that might help reproduce them/recreate the scenario of setting up a new OpenSearch instance.
    • a. The "Search By Model Name" filter on the Search page is failing with 400 errors on the ModelQuickSearch query.
      • code: "GRAPHQL_VALIDATION_FAILED"
      • message: 'Cannot query field "autocomplete" on type "modelNode".'
    • b. The "Model Details" page fails to load when I click on a model from the Search page table. I'm getting 400 errors for ModelDataQuery and VariantsData:
      • ModelDataQuery errors:
        • message: 'Cannot query field "distributor_part_number" on type "modelNode".'
        • message: 'Cannot query field "proteomics_url" on type "modelNode".'
        • message: 'Cannot query field "hits" on type "modelFiles".'
        • message: 'Cannot query field "matched_models" on type "modelNode". Did you mean "has_matched_models" or "matched_models_list"?'
        • code: "GRAPHQL_VALIDATION_FAILED" (same code for each error)
      • VariantsData errors:
        • message: 'Cannot query field "variants" on type "modelNode".'
        • code: "GRAPHQL_VALIDATION_FAILED"
  • (3) The publish validation seems to be preventing models that were previously valid from getting published. The following 2 fields are no longer allowing null values, despite the Prod Catalog containing many published examples of null or N/A for them:
    • growth_rate
    • age_at_diagnosis
  • To get around (3) locally, I added .nullable(true) to both fields under getPublishSchema.

I know I've seen the issues under (2) addressed on your local, so maybe it's something you'll only see if you're setting things up from scratch in OpenSearch or when attempting to migrate data from Elasticsearch?

To reproduce the issues under (2), the quickest way is to run the repairEs script, as this will delete your current indices, initialize new ones, and then republish all published models in your db.

# from project root
ENV=prd npm run repairEs

This should simulate starting from a fresh OpenSearch instance/empty indices and should hopefully reproduce the issues in (2).

Note that you'll likely need to trigger a variant/gene index publish after this to get the "Search by Genes" filters working on the home page. Do this by editing a model in the CMS, opening the "Variants" tab and use the "+ Research Somatic Variants" dropdown to do an "Automatic Import from GDC"

Image

If you are able to reproduce the issues under (2), I suspect we might need to revisit some of the new model/publish schemas. Comparing things against the 1.26.2 tag might be a good way to double-check that we're including everything we need: https://github.com/nci-hcmi-catalog/portal/tree/1.26.2

One thing I noticed when comparing an older ES index to a new OS index: looks like there were 3 fields related to matched models before.
Image

  • matched_models (array)
  • has_matched_models (boolean)
  • matched_models_list (string list of relevant matched models, containing the name of the current model)

I'm not sure if that might be related to the errors I'm getting loading the Model Details page, but it might be worth looking into.

Comment thread cms/src/services/search-client/publish.js Outdated
Copy link
Copy Markdown
Contributor

@mistryrn mistryrn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the "Last Updated" and "publish validation" issues from my previous feedback. I've re-tested locally and can confirm that those are mostly resolved 🎉

Additionally, the index issues I encountered during my testing last week are no longer occurring! Looking back, it may have been related to my running both ES and OS locally at the same time on Thursday 🙈

Just two more small issues, but these won't block testing on the new dev server:

  • (1) "Last Updated" is still not getting updated on Save. Thanks for fixing it on Publish, but we should also be updating this on Save as well for consistency with the existing behaviour (and to make it easier to find the last model worked on if it wasn't published right away).
  • (2) Individual model "Delete" functionality is currently broken (bulk delete still works). This is unrelated to any of your code changes, but instead a consequence of the Mongo/Mongoose 8 upgrade. Looks like Mongoose 8 relies on a dependency mquery@5.0.0 that tries to use a deprecated method .findOneAndRemove (which, consequently, was removed in Mongoose 8) 🙃
    • Not a huge deal, the fix would just be to upgrade to Mongoose 9, which uses mquery@6.0.0, which no longer references the deprecated methods.
    • I skimmed the breaking changes for Mongoose 8 → 9, the main change that might impact us is "Update pipelines disallowed by default": https://mongoosejs.com/docs/migrating_to_9.html#update-pipelines-disallowed-by-default
      • We use this functionality in a few of our migrations, but we should be able to just update them by adding , { updatePipeline: true }
    • If upgrading to Mongoose 9 breaks too many things, we could alternatively use a module resolution in the root package.json to force mquery to resolve to 6.0.0

Let's try to get (1) resolved ASAP, but (2) can be spun out into a separate ticket and we can inform the NIH of the temporary workaround if we don't resolve it before going to Prod.


Just leaving some reference for (2) below:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants