|
| 1 | +# fire-parallel |
| 2 | + |
| 3 | +The program `fire-parallel` was introduced into ldmx-sw in v4.5.7 and can be used to run multiple copies of the same `fire config.py` across a handful of local cores on your computer. |
| 4 | +Under-the-hood `fire-parallel` uses [GNU parallel](https://www.gnu.org/software/parallel/sphinx.html) which has a |
| 5 | +specific structure for defining the different command-line arguments that spawn the different copies of the process. |
| 6 | + |
| 7 | +The most basic example is running the same simulation job but with different run numbers to seed the random number generation. |
| 8 | +If `fire config.py 1` creates `events-1.root`, then you can |
| 9 | +``` |
| 10 | +fire-parallel config.py ::: 1 2 3 4 5 |
| 11 | +``` |
| 12 | +to do the five different runs all in parallel on the available cores on the machine. |
| 13 | +You should look at the GNU parallel documentation linked above for all of the details, it is a good resource and you can get much fancier than this basic example! |
| 14 | + |
| 15 | +~~~admonish note title="Note on Performance" collapsible=true |
| 16 | +Since all of these jobs are attempting to write to files on the same disk, we do not see a direct N-fold speed-up when using N cores, it seems like we roughly see a limit of a 2x speed-up from our basic testing. |
| 17 | +
|
| 18 | +This is okay as a first attempt to parallelize ldmx-sw processing and lines up with the number of cores assigned worker nodes at a few of the clusters that LDMX collaborators have access to. |
| 19 | +~~~ |
| 20 | + |
| 21 | +~~~admonish warning title="Careful!" |
| 22 | +The `fire-parallel` script does not check to make sure the arguments you provided or the configuration script being run are distinct jobs. Be careful to make sure you don't repeat run numbers and/or write data from different jobs to the same file. This will lead to confusing results that are not physical! |
| 23 | +~~~ |
| 24 | + |
| 25 | +### Passing Options Directly to GNU Parallel |
| 26 | +The options (starting with `-`) provided to `fire-parallel` are assumed to be options to `fire-parallel` or |
| 27 | +the configuration script. If you want to pass options to GNU `parallel` itself (for example, using `-j` to limit |
| 28 | +the number of cores it should use), you should store them in the `PARALLEL` environment variable. |
| 29 | +``` |
| 30 | +denv config env copy PARALLEL="-j2" |
| 31 | +denv fire-parallel config.py ::: 1 2 3 4 5 # only does 2 at a time |
| 32 | +``` |
| 33 | + |
| 34 | +### Combining Results |
| 35 | +Often the first thing you want to do after running multiple jobs in parallel is to combine the results into |
| 36 | +a single output file to look at. |
| 37 | +Assuming the output file is small enough (for example, its just histograms or very few events), you can |
| 38 | +use ROOT's `hadd` program to merge several ROOT files together. |
| 39 | +``` |
| 40 | +denv hadd output.root input-1.root input-2.root input-3.root ... |
| 41 | +``` |
| 42 | +You may need to enter the `denv` interactively if you have too many ROOT files to pass on the command line. |
| 43 | +``` |
| 44 | +denv |
| 45 | +hadd output.root input-*.root |
| 46 | +``` |
| 47 | + |
| 48 | +### Past ldmx-sw Versions |
| 49 | +`fire-parallel` is just a shell script, so you can use it as inspiration to write your own specific one! |
| 50 | +It will work with prior versions of ldmx-sw if you copy it into your usage space. For example, |
| 51 | +``` |
| 52 | +# using a version of ldmx-sw that did not have fire-parallel |
| 53 | +denv init ldmx/pro:v4.4.7 |
| 54 | +# download fire-parallel into this workspace |
| 55 | +wget https://raw.githubusercontent.com/LDMX-Software/ldmx-sw/refs/heads/trunk/Framework/app/fire-parallel |
| 56 | +# run fire-parallel from here |
| 57 | +denv ./fire-parallel ... |
| 58 | +``` |
0 commit comments