66
77EasyBuild can submit jobs to different backends including Slurm to install software,
88to * distribute* the often time-consuming installation of a set of software applications and
9- the dependencies they require to a cluster.
9+ the dependencies they require to a cluster. Each individual package is installed in a separate
10+ job and job dependencies are used to manage the dependencies between package so that no build
11+ is started before the dependencies are in place.
1012
1113This is done via the `` --job `` command line option.
1214
1315It is important to be aware of some details before you start using this, which we'll cover here.
1416
17+ !!! Warning "This section is not supported on LUMI, use at your own risk"
18+
19+ EasyBuild on LUMI is currently not fully configured to support job submission via Slurm. Several
20+ changes would be needed to the configuration of EasyBuild, including the location of the
21+ temporary files and build directory. Those have to be made by hand.
22+
23+ Due to the setup of the central software stack, this feature is currently useless to install
24+ the central stack. For user installations, there are also limitations as the enviornment
25+ on the compute nodes is different from the login nodes so, e.g., different locations for
26+ temporary files are being used. These would only be refreshed if the EasyBuild configuration
27+ modules are reloaded on the compute nodes which cannot be done currently in the way Slurm
28+ job submission is set up in EasyBuild.
29+
30+ Use material in this section with care; it has not been completely tested.
31+
32+
1533## Configuration
1634
1735The EasyBuild configuration that is active at the time that `` eb --job `` is used
@@ -25,6 +43,8 @@ that are specified via an [EasyBuild configuration file](configuration.md#config
2543This implies that any EasyBuild configuration files or `` $EASYBUILD_* `` environment variables
2644that are in place in the job environment are most likely * irrelevant* , since configuration settings
2745they specify they will most likely be overruled by the corresponding command line options.
46+ It does also imply however that the EasyBuild configuration that is in place when `` eb --job `` is used
47+ does also work on the compute nodes to which the job is submitted.
2848
2949
3050## Using `` eb --job ``
@@ -39,6 +59,9 @@ to ``Slurm``, for example by setting the corresponding environment variable:
3959export EASYBUILD_JOB_BACKEND=' Slurm'
4060```
4161
62+ On LUMI this is taken care of in the EasyBuild configuration modules such as `` EasyBuild-user `` .
63+
64+
4265### Job resources
4366
4467To submit an installation as a job, simply use `` eb --job `` :
@@ -73,13 +96,13 @@ For example, to specify a particular account that should be used for the jobs su
7396(equivalent with using the `` -A `` or `` --account `` command line option for `` sbatch `` ):
7497
7598``` shell
76- export SBATCH_ACCOUNT=' example_project '
99+ export SBATCH_ACCOUNT=' project_XXXXXXXXX '
77100```
78101
79102Or to submit to a particular Slurm partition (equivalent with the `` -p `` or `` --partition `` option for `` sbatch `` ):
80103
81104``` shell
82- export SBATCH_PARTITION=' example_partition '
105+ export SBATCH_PARTITION=' small '
83106```
84107
85108For more information about supported `` $SBATCH_* `` environment variables,
@@ -113,24 +136,29 @@ as jobs, to avoid that they fail almost instantly due to a lack of disk space.
113136Keep in mind that the active EasyBuild configuration is passed down into the submitted jobs,
114137so any configuration that is present on the workernodes may not have any effect.
115138
116- For example, if you commonly use ` /tmp/$USER ` for build directories on a login node,
117- you may need to tweak that when submitting jobs to use a different location:
139+ For example, on LUMI it is possible to use `` $XDG_RUNTIME_DIR `` on the login nodes which has
140+ the advantage that any leftovers of failed builds will be cleaned up when the user ends their last
141+ login session on that node, but it is not possible to do so on the compute nodes.
118142
119143``` shell
120144# EasByuild is configured to use /tmp/$USER on the login node
121- login01 $ eb --show-config | grep buildpath
122- buildpath (E) = /tmp/example
145+ uan01 $ eb --show-config | grep buildpath
146+ buildpath (E) = /run/user/XXXXXXXX/easybuild/build
123147
124- # use /localdisk /$USER for build directories when submitting installations as jobs
125- login01 $ eb --job --buildpath /localdisk/ $USER example.eb --robot
148+ # use /dev/shm /$USER for build directories when submitting installations as jobs
149+ login01 $ eb --job --buildpath /dev/shm/ $USER /easybuild example.eb --robot
126150```
127151
152+
128153### Temporary log files and build directories
129154
130- The temporary log file that EasyBuild creates is most likely going to end up on the local disk
131- of the workernode on which the job was started (by default in ` $TMPDIR ` or ` /tmp ` ).
132- If an installation fails, the job will finish and temporary files will likely be cleaned up instantly,
133- which may leave you wondering about the actual cause of the failing installation...
155+ The problems for the temporary log files are twofold. First, they may end up in a place
156+ that is not available on the compute nodes. E.g., for the same reasons as for the build
157+ path, the LUMI EasyBuild configuration will place the temporary files in a subdirectory of
158+ `` $XDG_RUNTIME_DIR `` on the loginnodes but a subdirectory of `` /dev/shm/$USER `` on the
159+ compute nodes. The second problem however is that if an installation fails, those log files are
160+ not even accessible anymore which may leave you wondering about the actual cause of the failing
161+ installation...
134162
135163To remedy this, there are a couple of EasyBuild configuration options you can use:
136164
@@ -139,18 +167,21 @@ To remedy this, there are a couple of EasyBuild configuration options you can us
139167 ``` shell
140168 $ eb --job example.eb --tmp-logdir $HOME /eb_tmplogs
141169 ```
170+ This will move at least the log file to a suitable place.
142171
143172* If you prefer having the entire log file stored in the Slurm job output files,
144173 you can use `` --logtostdout `` when submitting the jobs. This will result in extensive logging
145174 to your terminal window when submitting the jobs, but it will also make EasyBuild
146175 log to `` stdout `` when the installation is running in the job, and hence the log messages will be
147176 captured in the job output files.
148177
149- The same remark applies to build directories: they should be on a local filesystem (to avoid problems
150- that often occur when building software on a parallel filesystem like GPFS or Lustre),
151- which will probably be cleaned up automatically when a job fails. Here it is less easy to provide
152- general advice on how to deal with this, but one thing you can consider is retrying the installation
153- in an interactive job, so you can inspect the build directory after the installation fails.
178+ The build directory of course also suffers from the problem of being no longer accessible if the
179+ installation fails, but there it is not so easy to find a solution. Building on a shared file system
180+ is not only much slower, but in particular on parallel file systems like GPFS/SpectrumScale, Lustre
181+ or BeeGFS buiding sometimes fails in strange ways. One thing you can consider if you cannot do the
182+ build on a login node (e.g., because the code is not suitable for cross-compiling or the configure
183+ system does tests that would fail on the login node), is to rety the installation in an
184+ interactive job, so you can inspect the build directory after the installation fails.
154185
155186### Lock files
156187
@@ -171,37 +202,37 @@ subdirectory of ``installpath``) manually, or re-submit the job with ``eb --job
171202
172203As an example, we will let EasyBuild submit jobs to install `` AUGUSTUS `` with the `` foss/2020b `` toolchain.
173204
205+ !!! Warning "This example does not work on LUMI"
206+
207+ Note that this is an example using the FOSS common toolchain. For this reason it does not work on
208+ LUMI.
209+
174210### Configuration
175211
176212Before using `` --job `` , let's make sure that EasyBuild is properly configured:
177213
178214``` shell
179- # use $HOME/easybuild for software, modules, sources, etc.
180- export EASYBUILD_PREFIX=$HOME /easybuild
215+ # Load the EasyBuild-user module (central installations will not work at all
216+ # using job submission)
217+ module load LUMI/21.12
218+ module load partition/C
219+ module load EasyBuild-user
181220
182221# use ramdisk for build directories
183- export EASYBUILD_BUILDPATH=/dev/shm/$USER
222+ export EASYBUILD_BUILDPATH=/dev/shm/$USER /build
223+ export EASYBUILD_TMPDIR=/dev/shm/$USER /tmp
184224
185225# use Slurm as job backend
186226export EASYBUILD_JOB_BACKEND=Slurm
187227```
188228
189- In addition, add the path to the centrally installed software to `` $MODULEPATH `` via `` module use `` :
190229
191- ``` shell
192- module use /easybuild/modules/all
193- ```
194-
195- Load the EasyBuild module:
196-
197- ``` shell
198- module load EasyBuild
199- ```
200-
201- Let's assume that we also need to inform Slurm that jobs should be submitted into a particular account:
230+ We will also need to inform Slurm that jobs should be submitted into a particular account, and
231+ in a particular partition:
202232
203233``` shell
204- export SBATCH_ACCOUNT=example_project
234+ export SBATCH_ACCOUNT=project_XXXXXXXXX
235+ export SBATCH_PARTITION=' small'
205236```
206237
207238This will be picked up by the `` sbatch `` commands that EasyBuild will run to submit the software installation jobs.
@@ -234,14 +265,14 @@ $ eb AUGUSTUS-3.4.0-foss-2020b.eb --missing
234265Several dependencies are not installed yet, so we will need to use `` --robot `` to ensure that
235266EasyBuild also submits jobs to install these first.
236267
237- To speed up the installations a bit, we will request 10 cores for each submitted job (via `` --job-cores `` ).
268+ To speed up the installations a bit, we will request 8 cores for each submitted job (via `` --job-cores `` ).
238269That should be sufficient to let each installation finish in (well) under 1 hour,
239270so we only request 1 hour of walltime per job (via `` --job-max-walltime `` ).
240271
241272In order to have some meaningful job output files, we also enable trace mode (via `` --trace `` ).
242273
243274```
244- $ eb AUGUSTUS-3.4.0-foss-2020b.eb --job --job-cores 10 --job-max-walltime 1 --robot --trace
275+ $ eb AUGUSTUS-3.4.0-foss-2020b.eb --job --job-cores 8 --job-max-walltime 1 --robot --trace
245276...
246277== resolving dependencies ...
247278...
@@ -278,7 +309,7 @@ these jobs will be able to start.
278309After about 20 minutes, AUGUSTUS and all missing dependencies should be installed:
279310
280311```
281- $ ls -lrt $HOME/easybuild /modules/all/* /*.lua | tail -11
312+ $ ls -lrt $HOME/EasyBuild /modules/... /*.lua | tail -11
282313-rw-rw----. 1 example example 1634 Mar 29 10:13 /users/example/easybuild/modules/all/HTSlib/1.11-GCC-10.2.0.lua
283314-rw-rw----. 1 example example 1792 Mar 29 10:13 /users/example/easybuild/modules/all/SAMtools/1.11-GCC-10.2.0.lua
284315-rw-rw----. 1 example example 1147 Mar 29 10:13 /users/example/easybuild/modules/all/BamTools/2.5.1-GCC-10.2.0.lua
@@ -291,11 +322,9 @@ $ ls -lrt $HOME/easybuild/modules/all/*/*.lua | tail -11
291322-rw-rw----. 1 example example 1365 Mar 29 10:28 /users/example/easybuild/modules/all/SuiteSparse/5.8.1-foss-2020b-METIS-5.1.0.lua
292323-rw-rw----. 1 example example 2233 Mar 29 10:30 /users/example/easybuild/modules/all/AUGUSTUS/3.4.0-foss-2020b.lua
293324
294- $ module use $HOME/easybuild/modules/all
295-
296325$ module avail AUGUSTUS
297326
298- -------- /users/hkenneth/easybuild/modules/all ------ --
327+ -- EasyBuild managed user software for software stack ... --
299328 AUGUSTUS/3.4.0-foss-2020b
300329```
301330
0 commit comments