22Bebop
33=====
44
5- Bebop _ is a Cray CS400 cluster with Intel Broadwell and Knights Landing compute
5+ Bebop _ is a Cray CS400 cluster with Intel Broadwell compute
66nodes available in the Laboratory Computing Resources
77Center (LCRC) at Argonne National
88Laboratory.
@@ -52,24 +52,24 @@ for installing libEnsemble.
5252Job Submission
5353--------------
5454
55- Bebop uses Slurm _ for job submission and management. The two commands you'll
56- likely use the most to run jobs are ``srun `` and ``sbatch `` for running
57- interactively and batch, respectively.
58-
59- libEnsemble node-worker affinity is especially flexible on Bebop. By adjusting
60- ``srun `` runtime options _ users may assign multiple libEnsemble workers to each
61- allocated node(oversubscription) or assign multiple nodes per worker.
55+ Bebop uses PBS for job submission and management.
6256
6357Interactive Runs
6458^^^^^^^^^^^^^^^^
6559
66- You can allocate four Knights Landing nodes for thirty minutes through the following::
60+ You can allocate four Broadwell nodes for thirty minutes through the following::
61+
62+ qsub -I -A <project_id> -l select=4:mpiprocs=4 -l walltime=30:00
6763
68- salloc -N 4 -p knl -A [username OR project] -t 00:30:00
64+ Once in the interactive session, you may need to reload your modules::
6965
70- With your nodes allocated, queue your job to start with four MPI ranks::
66+ cd $PBS_O_WORKDIR
67+ module load anaconda3 gcc openmpi aocl
68+ conda activate bebop_libe_env
7169
72- srun -n 4 python calling.py
70+ Now run your script with four workers (one for generator and three for simulations)::
71+
72+ python my_libe_script.py --comms local --nworkers 4
7373
7474``mpirun `` should also work. This line launches libEnsemble with a manager and
7575**three ** workers to one allocated compute node, with three nodes available for
@@ -83,57 +83,10 @@ be initiated with ``libE_specs["dedicated_mode"]=True``
8383 and not oversubscribing, specify one more MPI process than the number of
8484 allocated nodes. The manager and first worker run together on a node.
8585
86- If you would like to interact directly with the compute nodes via a shell,
87- the following starts a bash session on a Knights Landing node
88- for thirty minutes::
89-
90- srun --pty -A [username OR project] -p knl -t 00:30:00 /bin/bash
91-
9286.. note ::
9387 You will need to reactivate your conda virtual environment and reload your
9488 modules! Configuring this routine to occur automatically is recommended.
9589
96- Batch Runs
97- ^^^^^^^^^^
98-
99- Batch scripts specify run settings using ``#SBATCH `` statements. A simple example
100- for a libEnsemble use case running in :doc: `distributed<platforms_index> ` MPI
101- mode on Broadwell nodes resembles the following:
102-
103- .. code-block :: bash
104- :linenos:
105-
106- #! /bin/bash
107- # SBATCH -J myjob
108- # SBATCH -N 4
109- # SBATCH -p bdwall
110- # SBATCH -A myproject
111- # SBATCH -o myjob.out
112- # SBATCH -e myjob.error
113- # SBATCH -t 00:15:00
114-
115- # These four lines construct a machinefile for the executor and slurm
116- srun hostname | sort -u > node_list
117- head -n 1 node_list > machinefile.$SLURM_JOBID
118- cat node_list >> machinefile.$SLURM_JOBID
119- export SLURM_HOSTFILE=machinefile.$SLURM_JOBID
120-
121- srun --ntasks 5 python calling_script.py
122-
123- With this saved as ``myscript.sh ``, allocating, configuring, and running libEnsemble
124- on Bebop is achieved by running ::
125-
126- sbatch myscript.sh
127-
128- Example submission scripts for running on Bebop in distributed and centralized mode
129- are also given in the :doc: `examples<example_scripts> `.
130-
131- Debugging Strategies
132- --------------------
133-
134- View the status of your submitted jobs with ``squeue ``, and cancel jobs with
135- ``scancel <Job ID> ``.
136-
13790Additional Information
13891----------------------
13992
@@ -144,5 +97,3 @@ See the LCRC Bebop docs here_ for more information about Bebop.
14497.. _conda : https://conda.io/en/latest/
14598.. _here : https://docs.lcrc.anl.gov/bebop/running-jobs-bebop/
14699.. _mpi4py : https://mpi4py.readthedocs.io/en/stable/
147- .. _options : https://slurm.schedmd.com/srun.html
148- .. _Slurm : https://slurm.schedmd.com/
0 commit comments