Skip to content

Commit 16bf770

Browse files
authored
Merge pull request #1520 from Libensemble/docs/old_resource_cleanups
Cleanup scripts, docs, resources, etc. of Summit, Crusher, Spock, Sunspot. Adjust Bebop docs
2 parents 6ab55ac + c2c1212 commit 16bf770

17 files changed

Lines changed: 51 additions & 343 deletions

File tree

docs/nitpicky

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,15 +42,12 @@ py:class pathlib._local.Path
4242
# Internal paths that are verified importable but Sphinx can't find
4343
py:class libensemble.resources.platforms.Aurora
4444
py:class libensemble.resources.platforms.GenericROCm
45-
py:class libensemble.resources.platforms.Crusher
4645
py:class libensemble.resources.platforms.Frontier
4746
py:class libensemble.resources.platforms.Perlmutter
4847
py:class libensemble.resources.platforms.PerlmutterCPU
4948
py:class libensemble.resources.platforms.PerlmutterGPU
5049
py:class libensemble.resources.platforms.Polaris
51-
py:class libensemble.resources.platforms.Spock
5250
py:class libensemble.resources.platforms.Summit
53-
py:class libensemble.resources.platforms.Sunspot
5451
py:class libensemble.resources.rset_resources.RSetResources
5552
py:class libensemble.resources.env_resources.EnvResources
5653
py:class libensemble.resources.resources.Resources

docs/platforms/bebop.rst

Lines changed: 12 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
Bebop
33
=====
44

5-
Bebop_ is a Cray CS400 cluster with Intel Broadwell and Knights Landing compute
5+
Bebop_ is a Cray CS400 cluster with Intel Broadwell compute
66
nodes available in the Laboratory Computing Resources
77
Center (LCRC) at Argonne National
88
Laboratory.
@@ -52,24 +52,24 @@ for installing libEnsemble.
5252
Job Submission
5353
--------------
5454

55-
Bebop uses Slurm_ for job submission and management. The two commands you'll
56-
likely use the most to run jobs are ``srun`` and ``sbatch`` for running
57-
interactively and batch, respectively.
58-
59-
libEnsemble node-worker affinity is especially flexible on Bebop. By adjusting
60-
``srun`` runtime options_ users may assign multiple libEnsemble workers to each
61-
allocated node(oversubscription) or assign multiple nodes per worker.
55+
Bebop uses PBS for job submission and management.
6256

6357
Interactive Runs
6458
^^^^^^^^^^^^^^^^
6559

66-
You can allocate four Knights Landing nodes for thirty minutes through the following::
60+
You can allocate four Broadwell nodes for thirty minutes through the following::
61+
62+
qsub -I -A <project_id> -l select=4:mpiprocs=4 -l walltime=30:00
6763

68-
salloc -N 4 -p knl -A [username OR project] -t 00:30:00
64+
Once in the interactive session, you may need to reload your modules::
6965

70-
With your nodes allocated, queue your job to start with four MPI ranks::
66+
cd $PBS_O_WORKDIR
67+
module load anaconda3 gcc openmpi aocl
68+
conda activate bebop_libe_env
7169

72-
srun -n 4 python calling.py
70+
Now run your script with four workers (one for generator and three for simulations)::
71+
72+
python my_libe_script.py --comms local --nworkers 4
7373

7474
``mpirun`` should also work. This line launches libEnsemble with a manager and
7575
**three** workers to one allocated compute node, with three nodes available for
@@ -83,57 +83,10 @@ be initiated with ``libE_specs["dedicated_mode"]=True``
8383
and not oversubscribing, specify one more MPI process than the number of
8484
allocated nodes. The manager and first worker run together on a node.
8585

86-
If you would like to interact directly with the compute nodes via a shell,
87-
the following starts a bash session on a Knights Landing node
88-
for thirty minutes::
89-
90-
srun --pty -A [username OR project] -p knl -t 00:30:00 /bin/bash
91-
9286
.. note::
9387
You will need to reactivate your conda virtual environment and reload your
9488
modules! Configuring this routine to occur automatically is recommended.
9589

96-
Batch Runs
97-
^^^^^^^^^^
98-
99-
Batch scripts specify run settings using ``#SBATCH`` statements. A simple example
100-
for a libEnsemble use case running in :doc:`distributed<platforms_index>` MPI
101-
mode on Broadwell nodes resembles the following:
102-
103-
.. code-block:: bash
104-
:linenos:
105-
106-
#!/bin/bash
107-
#SBATCH -J myjob
108-
#SBATCH -N 4
109-
#SBATCH -p bdwall
110-
#SBATCH -A myproject
111-
#SBATCH -o myjob.out
112-
#SBATCH -e myjob.error
113-
#SBATCH -t 00:15:00
114-
115-
# These four lines construct a machinefile for the executor and slurm
116-
srun hostname | sort -u > node_list
117-
head -n 1 node_list > machinefile.$SLURM_JOBID
118-
cat node_list >> machinefile.$SLURM_JOBID
119-
export SLURM_HOSTFILE=machinefile.$SLURM_JOBID
120-
121-
srun --ntasks 5 python calling_script.py
122-
123-
With this saved as ``myscript.sh``, allocating, configuring, and running libEnsemble
124-
on Bebop is achieved by running ::
125-
126-
sbatch myscript.sh
127-
128-
Example submission scripts for running on Bebop in distributed and centralized mode
129-
are also given in the :doc:`examples<example_scripts>`.
130-
131-
Debugging Strategies
132-
--------------------
133-
134-
View the status of your submitted jobs with ``squeue``, and cancel jobs with
135-
``scancel <Job ID>``.
136-
13790
Additional Information
13891
----------------------
13992

@@ -144,5 +97,3 @@ See the LCRC Bebop docs here_ for more information about Bebop.
14497
.. _conda: https://conda.io/en/latest/
14598
.. _here: https://docs.lcrc.anl.gov/bebop/running-jobs-bebop/
14699
.. _mpi4py: https://mpi4py.readthedocs.io/en/stable/
147-
.. _options: https://slurm.schedmd.com/srun.html
148-
.. _Slurm: https://slurm.schedmd.com/

docs/platforms/example_scripts.rst

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Example Scheduler Submission Scripts
22
====================================
33

44
Below are example submission scripts used to configure and launch libEnsemble
5-
on a variety of high-powered systems. See :doc:`here<platforms_index>` for more
5+
on a variety of high-powered systems. See :ref:`here<platform-index>` for more
66
information about the respective systems and configuration.
77

88
General examples
@@ -73,7 +73,7 @@ System Examples
7373
:caption: /examples/libE_submission_scripts/bebop_submit_pbs_distrib.sh
7474
:language: bash
7575

76-
.. dropdown:: Summit - On Launch Nodes with Multiprocessing
76+
.. dropdown:: Summit (Decommissioned) - On Launch Nodes with Multiprocessing
7777

7878
.. literalinclude:: ../../examples/libE_submission_scripts/summit_submit_mproc.sh
7979
:caption: /examples/libE_submission_scripts/summit_submit_mproc.sh
@@ -84,3 +84,4 @@ System Examples
8484
.. literalinclude:: ../../examples/libE_submission_scripts/cobalt_submit_mproc.sh
8585
:caption: /examples/libE_submission_scripts/cobalt_submit_mproc.sh
8686
:language: bash
87+

docs/platforms/platforms_index.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -215,7 +215,6 @@ libEnsemble on specific HPC systems.
215215
improv
216216
perlmutter
217217
polaris
218-
spock_crusher
219218
summit
220219
srun
221220
example_scripts

docs/platforms/spock_crusher.rst

Lines changed: 0 additions & 82 deletions
This file was deleted.

docs/platforms/summit.rst

Lines changed: 15 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,20 @@
1-
======
2-
Summit
3-
======
1+
=======================
2+
Summit (Decommissioned)
3+
=======================
44

5-
Summit_ is an IBM AC922 system located at the Oak Ridge Leadership Computing
6-
Facility (OLCF). Each of the approximately 4,600 compute nodes on Summit contains two
5+
Summit_ was an IBM AC922 system located at the Oak Ridge Leadership Computing
6+
Facility (OLCF). Each of the approximately 4,600 compute nodes on Summit contained two
77
IBM POWER9 processors and six NVIDIA Volta V100 accelerators.
88

9-
Summit features three tiers of nodes: login, launch, and compute nodes.
9+
Summit featured three tiers of nodes: login, launch, and compute nodes.
1010

1111
Users on login nodes submit batch runs to the launch nodes.
1212
Batch scripts and interactive sessions run on the launch nodes. Only the launch
1313
nodes can submit MPI runs to the compute nodes via ``jsrun``.
1414

15+
These docs are maintained to guide libEnsemble's usage on three-tier systems and/or
16+
`jsrun` systems similar to Summit.
17+
1518
Configuring Python
1619
------------------
1720

@@ -57,13 +60,13 @@ Or, you can install via ``conda``:
5760
5861
See :doc:`here<../advanced_installation>` for more information on advanced options
5962
for installing libEnsemble.
60-
6163
Special note on resource sets and Executor submit options
64+
6265
---------------------------------------------------------
6366

6467
When using the portable MPI run configuration options (e.g., num_nodes) to the
6568
:doc:`MPIExecutor<../executor/mpi_executor>` ``submit`` function, it is important
66-
to note that, due to the `resource sets`_ used on Summit, the options refer to
69+
to note that, due to the resource sets used on Summit, the options refer to
6770
resource sets as follows:
6871

6972
- num_procs (int, optional) – The total number resource sets for this run.
@@ -114,7 +117,7 @@ available on a Summit node, and thus two such tasks may be allocated to each nod
114117
Job Submission
115118
--------------
116119

117-
Summit uses LSF_ for job management and submission. For libEnsemble, the most
120+
Summit used LSF_ for job management and submission. For libEnsemble, the most
118121
important command is ``bsub`` for submitting batch scripts from the login nodes
119122
to execute on the launch nodes.
120123

@@ -191,20 +194,13 @@ Launching User Applications from libEnsemble Workers
191194
----------------------------------------------------
192195

193196
Only the launch nodes can submit MPI runs to the compute nodes via ``jsrun``.
194-
This can be accomplished in user ``sim_f`` functions directly. However, it is highly
197+
This can be accomplished in user simulator functions directly. However, it is highly
195198
recommended that the :doc:`Executor<../executor/ex_index>` interface
196-
be used inside the ``sim_f`` or ``gen_f``, because this provides a portable interface
199+
be used inside the simulator or generator, because this provides a portable interface
197200
with many advantages including automatic resource detection, portability,
198201
launch failure resilience, and ease of use.
199202

200-
Additional Information
201-
----------------------
202-
203-
See the OLCF guides_ for more information about Summit.
204-
205203
.. _conda: https://conda.io/en/latest/
206-
.. _guides: https://docs.olcf.ornl.gov/systems/summit_user_guide.html
207204
.. _LSF: https://www.olcf.ornl.gov/wp-content/uploads/2018/12/summit_workshop_fuson.pdf
208205
.. _mpi4py: https://mpi4py.readthedocs.io/en/stable/
209-
.. _resource sets: https://docs.olcf.ornl.gov/systems/summit_user_guide.html#job-launcher-jsrun
210-
.. _Summit: https://docs.olcf.ornl.gov/systems/summit_user_guide.html
206+
.. _Summit: https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/

docs/resource_manager/resource_detection.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Resource detection can be disabled by setting
3232
configuration options on the Executor submit line.
3333

3434
This will usually work sufficiently on
35-
systems that have application-level scheduling and queuing (e.g., ``jsrun`` on Summit).
35+
systems that have application-level scheduling and queuing (e.g., ``jsrun``).
3636
However, on many cluster and multi-node systems, if the built-in resource
3737
manager is disabled, then runs without a hostlist or machinefile supplied may be
3838
undesirably scheduled to the same nodes.

examples/libE_submission_scripts/cobalt_submit_mproc.sh

Lines changed: 0 additions & 48 deletions
This file was deleted.

0 commit comments

Comments
 (0)