Skip to content

Commit 5f5c231

Browse files
author
committed
Deployed 0d0926a with MkDocs version: 1.3.0
1 parent 5a4de1d commit 5f5c231

4 files changed

Lines changed: 94 additions & 79 deletions

File tree

2022-isc22/integration_lumi/index.html

Lines changed: 52 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1752,7 +1752,7 @@ <h2 id="general-information">General information<a class="headerlink" href="#gen
17521752
<a href="https://eurohpc-ju.europa.eu/discover-eurohpc-ju#ecl-inpage-211">EuroHPC pre-exascale systems</a>
17531753
meant to be installed in 2022-2023,
17541754
together with Leonardo (installed at Cineca) and MareNostrum5 (installed at BSC).
1755-
LUMI, which stands for Large Unified Modern Infrastructure, is hosted by the
1755+
LUMI, which stands for <em>Large Unified Modern Infrastructure</em>, is hosted by the
17561756
<a href="https://www.lumi-supercomputer.eu/lumi-consortium/">LUMI consortium</a>,
17571757
a consortium of 10 countries: Finland, Belgium, Czech Republic, Denmark, Estonia, Iceland,
17581758
Norway, Poland, Sweden, and Switzerland. It was supposed to be installed by the end of 2021,
@@ -1761,39 +1761,52 @@ <h2 id="general-information">General information<a class="headerlink" href="#gen
17611761
<p><a href="https://www.lumi-supercomputer.eu/may-we-introduce-lumi/">
17621762
<img src="../../img/lumi-snowflake.png" style="float:right" width="100%"/>
17631763
</a></p>
1764-
<p>LUMI is a HPE Cray EX supercomputer with several partitions targeted for different use cases.
1764+
1765+
<p>LUMI is an <strong>HPE Cray EX supercomputer</strong> with several partitions targeted for different use cases.
17651766
It is also possible to run heterogeneous jobs across multiple partitions.</p>
17661767
<ul>
1767-
<li>The main compute power is provided by the LUMI-G GPU compute partition consisting of 2560 nodes.
1768+
<li>
1769+
<p>The main compute power is provided by the <em>LUMI-G</em> GPU compute partition consisting of 2,560 nodes.
17681770
The LUMI-G node is a revolutionary compute node in the x86+GPU-world.
1769-
It is also truly a "GPU first" system.
1770-
Each GPU compute node has a single 64 core AMD EPYC 7A53 "Trento" CPU and 4 MI250X GPUs.
1771-
Each MI250X package contains two GPU compute dies connected to each other via AMD's InfinityFabric
1772-
interconnect and 8 HBM2e memory stacks, 4 per die. The GPUs and CPU are all connected through
1773-
AMD's InfinityFabric interconnect, creating a cache coherent unified memory space in each node.
1774-
The Trento CPU is a zen3 generation product but with an optimised I/O die that can run the
1775-
InfinityFabric interconnect at a higher speed than standard Milan CPUs.
1776-
Each node also has 4 200Gbit/s SlingShot 11 interconnect cards, each connected directly to
1777-
a different GPU package. Each node has 512GB HBM2e RAM spread evenly across the GPU dies and
1778-
512 GB of regular DDR4 DRAM connected to the CPU. The nodes are diskless nodes.
1779-
The (very) theoretical peak performance of a GPU node is
1780-
around 200 Tflops for FP64 vector operations or 400 TFlops for FP64 matrix operations.</li>
1781-
<li>The main CPU partition, called LUMI-C, consists of 1536 nodes with 2 64-core AMD
1782-
EPYC 7763 CPUs. Most nodes have 256 GB of RAM memory, but there are 128 nodes with
1771+
It is also truly a "GPU first" system.</p>
1772+
<p>Each GPU compute node has a single 64 core AMD EPYC 7A53 "Trento" CPU and 4 MI250X GPUs.
1773+
Each MI250X package contains two GPU compute dies connected to each other via AMD's InfinityFabric
1774+
interconnect and 8 HBM2e memory stacks, 4 per die. The GPUs and CPU are all connected through
1775+
AMD's InfinityFabric interconnect, creating a cache coherent unified memory space in each node.
1776+
The Trento CPU is a Zen3 generation product but with an optimised I/O die that can run the
1777+
InfinityFabric interconnect at a higher speed than standard Milan CPUs.</p>
1778+
<p>Each node also has 4 200Gbit/s SlingShot 11 interconnect cards, each connected directly to
1779+
a different GPU package. Each node has 512GB HBM2e RAM spread evenly across the GPU dies and
1780+
512 GB of regular DDR4 DRAM connected to the CPU. The nodes are diskless nodes.
1781+
The (very) theoretical peak performance of a GPU node is
1782+
around 200 Tflops for FP64 vector operations or 400 TFlops for FP64 matrix operations.</p>
1783+
</li>
1784+
<li>
1785+
<p>The main CPU partition, called <em>LUMI-C</em>, consists of 1,536 nodes with two 64-core AMD
1786+
EPYC 7,763 CPUs. Most nodes have 256 GB of RAM memory, but there are 128 nodes with
17831787
512 GB of RAM and 32 nodes with 1 TB of RAM. In the final system, each node will be equipped
1784-
with one 200 Gbit/s SlingShot 11 interconnect card. These nodes are diskless nodes.</li>
1785-
<li>A section mostly meant for interactive data analysis consists of 8 nodes with 2 64-core
1788+
with one 200 Gbit/s SlingShot 11 interconnect card. These nodes are diskless nodes.</p>
1789+
</li>
1790+
<li>
1791+
<p>A section mostly meant for interactive data analysis consists of 8 nodes with two 64-core
17861792
AMD EPYC 7742 CPUs and 4 TB of RAM per node. These nodes are connected to the interconnect
1787-
through 2 100 Gbit/s SlingShot 10 links.</li>
1788-
<li>The second part of LUMI-D, the data visualisation section, consists of 8 nodes with two
1793+
through two 100 Gbit/s SlingShot 10 links.</p>
1794+
</li>
1795+
<li>
1796+
<p>The second part of <em>LUMI-D</em>, the data visualisation section, consists of 8 nodes with two
17891797
64-core AMD EPYC 7742 CPUs and 8 NVIDIA A40 GPUs each. Each node has 2 TB of RAM and is
1790-
connected to the interconnect through 2 100 Gbit/s SlingShot 10 links.</li>
1791-
<li>LUMI-K will be an OpenShift/Kubernetes container cloud platform for running microservices
1792-
with roughly 50 nodes.</li>
1793-
<li>LUMI also has an extensive storage system<ul>
1798+
connected to the interconnect through two 100 Gbit/s SlingShot 10 links.</p>
1799+
</li>
1800+
<li>
1801+
<p><em>LUMI-K</em> will be an OpenShift/Kubernetes container cloud platform for running microservices
1802+
with roughly 50 nodes.</p>
1803+
</li>
1804+
<li>
1805+
<p>LUMI also has an extensive storage system</p>
1806+
<ul>
17941807
<li>There is a 7 PB flash-based storage system with 2 TB/s of bandwidth and high IOPS
17951808
capability using the Lustre parallel file system and Cray ClusterStor E1000 technology.</li>
1796-
<li>The main disk based storage contains of 4 20 PB storage systems based on regular hard
1809+
<li>The main disk based storage contains of four 20 PB storage systems based on regular hard
17971810
disks and also using the Lustre parallel file system.</li>
17981811
<li>A 30 PB Ceph-based encrypted object storage system for storing, sharing and staging data
17991812
will become available at a later date.</li>
@@ -1803,14 +1816,15 @@ <h2 id="general-information">General information<a class="headerlink" href="#gen
18031816
<h2 id="challenges">Challenges<a class="headerlink" href="#challenges" title="Permanent link">&para;</a></h2>
18041817
<ul>
18051818
<li>
1806-
<p>LUMI comes with the HPE Cray Programming Environment (PE) with the Cray Compiling Environment
1819+
<p>LUMI comes with the <em>HPE Cray Programming Environment (PE)</em> with the Cray Compiling Environment
18071820
(a fully Clang/LLVM-based C/C++ compiler and a Fortran compiler using a Cray frontend but
18081821
LLVM-based backend) and GNU and AMD compilers repackaged by Cray. The HPE Cray PE uses
18091822
an MPICH-based MPI implementation (with libfabric backend on SlingShot 11) and also comes
18101823
with its own optimised mathematics libraries containing all the usual suspects you expect
18111824
from an EasyBuild toolchain. The HPE Cray PE environment is managed outside EasyBuild, but
18121825
EasyBuild has a mechanism to integrate with it.</p>
1813-
<p>Due to the specific hardware and software setup of LUMI using the EasyBuild common toolchains
1826+
<p>Due to the specific hardware and software setup of LUMI using the <a href="https://docs.easybuild.io/en/latest/Common-toolchains.html">EasyBuild common
1827+
toolchains</a>
18141828
is anything but straightforward. Getting Open MPI, a key component of the foss toolchain, to work
18151829
is a bit of a challenge at the moment. The Intel compilers are not a very good match with
18161830
AMD CPUs. One needs to be very careful when choosing the target architecture for code
@@ -1819,7 +1833,7 @@ <h2 id="challenges">Challenges<a class="headerlink" href="#challenges" title="Pe
18191833
cause problems with AMD CPUs.</p>
18201834
</li>
18211835
<li>
1822-
<p>LUMI has only a small central support team of 9 FTE. It is obvious that we cannot give the same
1836+
<p>LUMI has only a <em>small central support team of 9 FTE</em>. It is obvious that we cannot give the same
18231837
level of support for software installations as some of the big sites such as JSC can do. Moreover,
18241838
due to the nature of the LUMI environment with its Cray PE and novel AMD GPUs, installing software
18251839
is more challenging than on your typical Intel + NVIDIA GPU cluster with NVIDIA/Mellanox
@@ -1873,9 +1887,11 @@ <h2 id="challenges">Challenges<a class="headerlink" href="#challenges" title="Pe
18731887
</li>
18741888
</ul>
18751889
<h2 id="solution-with-easybuild-and-lmod">Solution with EasyBuild and Lmod<a class="headerlink" href="#solution-with-easybuild-and-lmod" title="Permanent link">&para;</a></h2>
1876-
<p>On LUMI we selected EasyBuild as our primary software installation tool, but also offer some limited
1890+
<p><a href="https://lumi-supercomputer.eu/easybuild-lumis-primary-software-installation-tool-introduced/">On LUMI we selected EasyBuild as our primary software installation
1891+
tool</a>,
1892+
but also offer some limited
18771893
support for Spack. EasyBuild was selected as there is already a lot of experience with EasyBuild in
1878-
several of the LUMI consortium countries and as it is a good fit with the goals of the EuroHPC JU
1894+
several of the LUMI consortium countries, and as it is a good fit with the goals of the EuroHPC JU
18791895
as they want to establish a European HPC ecosystem with a European technology option at every level.
18801896
The developers of EasyBuild are also very accessible and it helps that the lead developer and several
18811897
of the maintainers are from LUMI consortium countries.</p>
@@ -1935,8 +1951,8 @@ <h3 id="easybuild-for-software-management">EasyBuild for software management<a c
19351951
The fact that each easyconfig file contains a very precise list of dependencies, including versions and
19361952
not only the names of the dependencies, is both a curse and a blessing. It is a curse when we need to upgrade
19371953
to a new compiler and also want to upgrade versions of certain dependencies, as a lot of easyconfig files need
1938-
to be checked and edited. In those cases the automatic concretiser of Spack may help to get running quicker.
1939-
But that very precise description is also a blessing when communicating with users as you can
1954+
to be checked and edited. In those cases the automatic concretiser of Spack may help to get running quicker.</p>
1955+
<p>However, that very precise description is also a blessing when communicating with users as you can
19401956
communicate with them through EasyBuild recipes (and possibly an easystack file, which defines a list of
19411957
easyconfig files to install) rather than having a part of the specification in command line options of the
19421958
tool. So a user doesn't need to copy long command lines and as a support person
@@ -2079,7 +2095,8 @@ <h3 id="external-modules">External modules<a class="headerlink" href="#external-
20792095
EasyBuild configuration modules take care of pointing to the right metadata file, which is specific for each version of the
20802096
Cray PE and hence each version of the LUMI software stack.</p>
20812097
<h3 id="software-specific-easyblocks">Software-specific easyblocks<a class="headerlink" href="#software-specific-easyblocks" title="Permanent link">&para;</a></h3>
2082-
<p>EasyBuild comes with a lot of software-specific easyblocks. These have only been tested with the common toolchains
2098+
<p>EasyBuild comes with a lot of <a href="https://docs.easybuild.io/en/latest/eb_list_easyblocks.html">software-specific easyblocks</a>.
2099+
These have only been tested with the common toolchains
20832100
in the automated EasyBuild test procedures. As a result, many of those easyblocks will fail with the Cray toolchains
20842101
(and with many other custom toolchains). A common problem is that they don't recongnise the compilers as they test
20852102
for the presence of certain modules and hence simply stop with an error message that the compiler is not
@@ -2137,7 +2154,7 @@ <h2 id="further-reading-and-information">Further reading and information<a class
21372154
<small>
21382155

21392156
Last update:
2140-
<span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-date">May 24, 2022</span>
2157+
<span class="git-revision-date-localized-plugin git-revision-date-localized-plugin-date">May 27, 2022</span>
21412158

21422159

21432160
</small>

2022-isc22/practical_info/index.html

Lines changed: 41 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1611,52 +1611,50 @@ <h2 id="slack">Slack<a class="headerlink" href="#slack" title="Permanent link">&
16111611
session.</p>
16121612
<p>You can self-request an invitation to join the EasyBuild Slack via
16131613
<a href="https://easybuild.io/join-slack">https://easybuild.io/join-slack</a>.</p>
1614-
<p>??? Reproducing the tutorial environment at home (after the workshop) "(click to show steps)"</p>
1615-
<pre><code>*Note:* These steps might differ on our system. Please reach out to us on Slack if you run into problems.
1616-
1617-
The prepared environment remains available during the conference. If after the conference
1614+
<details class="success">
1615+
<summary>Reproducing the tutorial environment at home (after the workshop): (click to show steps)</summary>
1616+
<p><em>Note:</em> These steps might differ on our system. Please reach out to us on Slack if you run into problems.</p>
1617+
<p>The prepared environment remains available during the conference. If after the conference
16181618
you want to go through the tutorial and try the exercises on your home system, you can follow
1619-
this procedure while working your way through the tutorial:
1620-
1621-
- [Install EasyBuild](installation.md). We recommend to use the
1622-
["Installing EasyBuild with EasyBuild" method](../installation/#method-2-installing-easybuild-with-easybuild),
1623-
but choosing a different directory for the `--prefix` argument. That directory should
1624-
then be used wherever `/easybuild` is used in the tutorial text.
1625-
1626-
Assume that the installation directory is stored in `$_PREFIX_`. The series of commands to install
1627-
EasyBuild and make the EasyBuild module available are
1628-
```shell
1629-
module unuse $MODULEPATH
1630-
export EB_TMPDIR=/tmp/$USER/eb_tmp
1631-
python3 -m pip install --ignore-installed --prefix $EB_TMPDIR easybuild
1632-
export PATH=$EB_TMPDIR/bin:$PATH
1633-
export PYTHONPATH=$(/bin/ls -rtd -1 $EB_TMPDIR/lib*/python*/site-packages | tail -1):$PYTHONPATH
1634-
export EB_PYTHON=python3
1635-
eb --install-latest-eb-release --prefix $_PREFIX_
1636-
module use $_PREFIX_/modules/all
1637-
```
1638-
The first line (the `module unuse` command) cleans the environment and assures that modules already
1639-
installed on the system will not screw up the installation that you intend to do.
1640-
1641-
Alternatively, when newer versions of EasyBuild are available than the version 4.5.4 used to prepare
1642-
this tutorial, the line with `eb --install-latest-eb-release` can be replaced with
1643-
```shell
1644-
eb EasyBuild-4.5.4.eb --prefix $_PREFIX_
1645-
```
1646-
to install the version of EasyBuild used for the preparation of this tutorial.
1647-
1648-
- Install the software needed for the tutorial in the same directory structure as EasyBuild.
1619+
this procedure while working your way through the tutorial:</p>
1620+
<ul>
1621+
<li>
1622+
<p><a href="../installation/">Install EasyBuild</a>. We recommend to use the
1623+
<a href="../installation/#method-2-installing-easybuild-with-easybuild">"Installing EasyBuild with EasyBuild" method</a>,
1624+
but choosing a different directory for the <code>--prefix</code> argument. That directory should
1625+
then be used wherever <code>/easybuild</code> is used in the tutorial text.</p>
1626+
<p>Assume that the installation directory is stored in <code>$_PREFIX_</code>. The series of commands to install
1627+
EasyBuild and make the EasyBuild module available are
1628+
<div class="highlight"><pre><span></span><code>module unuse <span class="nv">$MODULEPATH</span>
1629+
<span class="nb">export</span> <span class="nv">EB_TMPDIR</span><span class="o">=</span>/tmp/<span class="nv">$USER</span>/eb_tmp
1630+
python3 -m pip install --ignore-installed --prefix <span class="nv">$EB_TMPDIR</span> easybuild
1631+
<span class="nb">export</span> <span class="nv">PATH</span><span class="o">=</span><span class="nv">$EB_TMPDIR</span>/bin:<span class="nv">$PATH</span>
1632+
<span class="nb">export</span> <span class="nv">PYTHONPATH</span><span class="o">=</span><span class="k">$(</span>/bin/ls -rtd -1 <span class="nv">$EB_TMPDIR</span>/lib*/python*/site-packages <span class="p">|</span> tail -1<span class="k">)</span>:<span class="nv">$PYTHONPATH</span>
1633+
<span class="nb">export</span> <span class="nv">EB_PYTHON</span><span class="o">=</span>python3
1634+
eb --install-latest-eb-release --prefix <span class="nv">$_PREFIX_</span>
1635+
module use <span class="nv">$_PREFIX_</span>/modules/all
1636+
</code></pre></div>
1637+
The first line (the <code>module unuse</code> command) cleans the environment and assures that modules already
1638+
installed on the system will not screw up the installation that you intend to do.</p>
1639+
<p>Alternatively, when newer versions of EasyBuild are available than the version 4.5.4 used to prepare
1640+
this tutorial, the line with <code>eb --install-latest-eb-release</code> can be replaced with
1641+
<div class="highlight"><pre><span></span><code>eb EasyBuild-4.5.4.eb --prefix <span class="nv">$_PREFIX_</span>
1642+
</code></pre></div>
1643+
to install the version of EasyBuild used for the preparation of this tutorial.</p>
1644+
</li>
1645+
<li>
1646+
<p>Install the software needed for the tutorial in the same directory structure as EasyBuild.
16491647
This can be done in a single command (after loading the EasyBuild module). The workings of this command is explained in the
1650-
["Configuring EasyBuild"](configuration.md) and ["Basic usage of EasyBuild"](basic_usage.md)
1648+
<a href="../configuration/">"Configuring EasyBuild"</a> and <a href="../basic_usage/">"Basic usage of EasyBuild"</a>
16511649
sections:
1652-
```shell
1653-
module load EasyBuild
1654-
eb CMake-3.22.1-GCCcore-11.2.0.eb SciPy-bundle-2021.10-foss-2021b.eb --prefix $_PREFIX_ --robot
1655-
```
1656-
1657-
Note that the installation can take a few hours and that some steps require a lot of CPU time (e.g., the testing
1658-
done when installing SciPy), so you may not be able to do it on the login nodes of a cluster.
1659-
</code></pre>
1650+
<div class="highlight"><pre><span></span><code>module load EasyBuild
1651+
eb CMake-3.22.1-GCCcore-11.2.0.eb SciPy-bundle-2021.10-foss-2021b.eb --prefix <span class="nv">$_PREFIX_</span> --robot
1652+
</code></pre></div></p>
1653+
</li>
1654+
</ul>
1655+
<p>Note that the installation can take a few hours and that some steps require a lot of CPU time (e.g., the testing
1656+
done when installing SciPy), so you may not be able to do it on the login nodes of a cluster.</p>
1657+
</details>
16601658
<hr />
16611659
<hr />
16621660
<p><a href="../introduction/"><em>next: Introduction</em></a> - <a href="../"><em>(back to overview page)</em></a></p>

search/search_index.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

sitemap.xml.gz

0 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)