Skip to content

Offline Search Workflow Handling of Bank Directories#5313

Merged
kkacanja merged 9 commits intogwastro:masterfrom
kkacanja:kkacanja-workflow
Mar 31, 2026
Merged

Offline Search Workflow Handling of Bank Directories#5313
kkacanja merged 9 commits intogwastro:masterfrom
kkacanja:kkacanja-workflow

Conversation

@kkacanja
Copy link
Copy Markdown
Contributor

This PR allows for the offline search workflow to be able to handle pre-split template banks. The user can specify the directories for the injections and the full data path in the config file as

[workflow-tmpltbank]
tmpltbank-method = MANUAL_DIRECTORY
tmpltbank-pregenerated-bank = /home/path/to/master/bank.hdf

[workflow-tmpltbank-full_data]
tmpltbank-directory = /home/path/to/banks/

[workflow-tmpltbank-injections]
tmpltbank-directory =  /home/path/to/injection/banks/

[workflow-splittable]
splittable-method =NOOP

The workflow will then utilize the banks in the directory and use them directly in the inspiral jobs.

@kkacanja kkacanja changed the title Adding the files neccesary to take in bank directories Offline Search Workflow Handling of Bank Directories Mar 24, 2026
@ahnitz
Copy link
Copy Markdown
Member

ahnitz commented Mar 24, 2026

@kkacanja Make sure you look through the failed small search especially. If it's a transient issue, just rerun it, but if there are failures there you'll need to fix them before we can approve this. Also fix the flake 8 issues.

@kkacanja
Copy link
Copy Markdown
Contributor Author

@kkacanja Make sure you look through the failed small search especially. If it's a transient issue, just rerun it, but if there are failures there you'll need to fix them before we can approve this. Also fix the flake 8 issues.

There were issues in how the bank was being passed to hdf_trigger_merge. It was using the original pre-generated bank. Looking at how to correct it to use the master bank from pycbc_coinc_bank2hdf which adds in the template hashes.

@spxiwh
Copy link
Copy Markdown
Contributor

spxiwh commented Mar 24, 2026

@kkacanja This is a great feature to add (especially if using compressed template banks) ... It's much cleaner than doing this via the cache files, and changing file names a lot, as we have been doing for recent LVK runs.

A request though: Can this be done without changes to pycbc_make_offline_search_workflow ... If the changes are implemented in the lower level functions it's trivial for other workflows (e.g. GRB) to just pick this up and use it.

If I also treat workflow-tmpltbank and workflow-splittable as two separate things, I think all the changes here are in workflow-splittable not in workflow-tmpltbank, so shouldn't the options be added into workflow-splittable? ... I guess I mean something like:

# No changes here
[workflow-tmpltbank]
tmpltbank-method = PREGENERATED_BANK
tmpltbank-pregenerated-bank = /home/path/to/master/bank.hdf

# Everything else is workflow-splittable
[workflow-splittable]
splittable-method =MANUAL DIRECTORY

[workflow-splittable-full_data]
tmpltbank-directory = /home/path/to/banks/

[workflow-splittable-injections]
tmpltbank-directory =  /home/path/to/injection/banks/

Then one still calls setup_tmpltbank_workflow in all cases (which doesn't change) followed by setup_splittable_workflow which handles the new MANUAL_DIRECTORY method.

Does that make sense? Is there a reason why this doesn't work?

@ahnitz
Copy link
Copy Markdown
Member

ahnitz commented Mar 24, 2026

@kkacanja I forget why we changed our mind to go this route, but I think @spxiwh point has merit. What do you think of that organization?

@kkacanja
Copy link
Copy Markdown
Contributor Author

Yea this should be doable, I'll make the changes according to Ian's suggestions. I went the template bank route in hindsight, since it seemed to be the easiest to implement, but I think the split bank route is now the better choice. Thank you for the suggestions!

@kkacanja kkacanja force-pushed the kkacanja-workflow branch from a6de9aa to 43a8dc0 Compare March 25, 2026 15:05
@kkacanja
Copy link
Copy Markdown
Contributor Author

These changes shouldn't affect any of the current tests. I have tested that the method works for the offline search.
The syntax remains as:


[workflow-tmpltbank]
tmpltbank-method = PREGENERATED_BANK
tmpltbank-pregenerated-bank = /home/path/to/final/bank.hdf

[workflow-splittable]
splittable-method = MANUAL_DIRECTORY

[workflow-splittable-full_data]
tmpltbank-directory = /home/path/to/full/banks/

[workflow-splittable-injections]
tmpltbank-directory = /home/path/to/inj/banks/

Do we want to implement an additional test for using this method?

@ahnitz
Copy link
Copy Markdown
Member

ahnitz commented Mar 25, 2026

@kkacanja It's not a bad idea to add a test. We can leave it as an optional pass for now though. It's also the the test we can update with the other search improvements we are working on that aren't merged yet, so it should run in parallel to the existing search anyway. It makes sense to start now, so that when those other changes get put it in we have the test updated along with it without removing the standard search configuration.

@kkacanja kkacanja requested a review from ahnitz March 25, 2026 17:28
@ahnitz
Copy link
Copy Markdown
Member

ahnitz commented Mar 26, 2026

@kkacanja I think this code is now much more straightforward. Can you add a test? Then this should be ready to go. The test can essentially reuse the files from the existing search, and just have a script that calls the code to presplit the bank it generates.

@spxiwh
Copy link
Copy Markdown
Contributor

spxiwh commented Mar 26, 2026

Thanks @kkacanja .. I think this looks much more straightforward now as well, and easier to use in other applications. Thanks!

Comment thread examples/search_splittable/check_job.py Outdated
@@ -0,0 +1,51 @@
import subprocess
import time
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkacanja Doe sthis need to be a new file? Why not just link to the existing script?

Comment thread examples/search_splittable/stats.sh Outdated
@@ -0,0 +1,44 @@
#!/bin/bash
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkacanja It might be worth checking which scripts you actually need to modify and simply link to the existing ones if you can just directly reuse? We can also duplicate later if someone needs to be changed, but it woudl be good to avoid duplication where possible.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check I stripped down the folder to only the bank generation and the new analysis file. Do you want to add in the code to generate the workflow?

@kkacanja kkacanja requested a review from ahnitz March 27, 2026 17:16
Copy link
Copy Markdown
Member

@ahnitz ahnitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkacanja Pending the test passing in the CI, I think this is now ready to go.

@kkacanja kkacanja enabled auto-merge (squash) March 30, 2026 18:34
@kkacanja kkacanja merged commit 023fe3d into gwastro:master Mar 31, 2026
36 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants