Skip to content

Commit 13310e5

Browse files
committed
Begin section on parallelization
1 parent 8fe3e89 commit 13310e5

6 files changed

Lines changed: 67 additions & 8 deletions

File tree

docs/_toc.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ chapters:
1616
- file: fundamentals/joint_inversion
1717
- file: fundamentals/depth_of_investigation
1818
- file: fundamentals/optimization
19+
- file: fundamentals/parallelization
1920
- file: tutorials/introduction
2021
sections:
2122
- file: tutorials/background

docs/fundamentals/images/distributed_parallelization.svg

Lines changed: 1 addition & 0 deletions
Loading
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
.. _parallelization:
2+
3+
Parallelization
4+
===============
5+
6+
For a given inversion routine, the problem can be decomposed into a series of sub-problems, or tiles, each assigned a mesh and a survey. During the inverse process, predicted data and derivatives are continuously requested from the sub-problems. These operations are parallelized within each sub-problem, as well as externally such that sub-problems can be computed concurrently.
7+
8+
9+
.. figure:: ./images/distributed_parallelization.svg
10+
:align: center
11+
:width: 80%
12+
13+
Schematic representation of the computing elements of a tiled inversion. Each tile is assigned a mesh and a survey, with array operations parallelized by dask bookending a direct solvers. The tiles can be distributed across multiple workers, each with a limited number of threads to optimize performance. Only 1-dimensional arrays are returned to the main process.
14+
15+
This following sections describe the different level of parallelization used by the inversion routines and how to optimize resources.
16+
17+
18+
Direct Solvers
19+
--------------
20+
21+
The direct solvers are used for all methods evaluated by partial differential (PDE) equations, such as electromagnetics and electric methods. The `Pardiso <https://github.com/simpeg/pydiso>`_ and `Mumps <https://gitlab.kwant-project.org/kwant/python-mumps>`_ solvers are parallelized using OpenMP. Note that the current implementation of the solvers are not thread-safe, and can therefore not be shared within parallel processes.
22+
23+
The number of threads used by the solvers can be set by running the command
24+
25+
.. code-block::
26+
27+
set OMP_NUM_THREADS=X
28+
29+
before launching the python program. Alternatively, setting ``OMP_NUM_THREADS`` as a local environment variable will set it permanently. The default value is the number of threads available on the machine.
30+
31+
Dask
32+
----
33+
34+
Most operations related to generating arrays are handled by the `dask <https://www.dask.org/>`_ library. A mixture of dask.arrays and dask.delayed calls are used to parallelize the computations across multiple threads. If a direct solver is involved, the dask operations are bookending the solver to avoid thread-safety issues. Otherwise, the dask operations are performed in parallel across the available threads.
35+
36+
37+
Dask.distributed
38+
----------------
39+
40+
For large systems, such as High-Performance Computing (HPC) clusters, the ``dask.distributed`` library can be used to distribute the computation from tiles across multiple ``workers``. It has been found that the performance of direct solvers tend to saturate on large numbers of threads. By spawning multiple processes, each with a limited number of threads, the performance can be improved by running multiple tiles in parallel. The number of workers and threads per worker can be set with the following parameters added to the ui.json file:
41+
42+
.. code-block::
43+
44+
{
45+
...
46+
"n_workers": X,
47+
"n_threads": Y,
48+
"performance_report": true
49+
}
323 KB
Loading

docs/plate-simulation/simulation.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.. _plate_simulation_index:
1+
.. _plate_simulation:
22

33
Plate Simulation
44
================

docs/plate-simulation/sweep.rst

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,15 @@
33
Batch Simulations
44
=================
55

6-
The Plate Sweep module provides a user interface for generating and running a batch of simulations by sweeping one or more of the input parameters. The user can select which parameters to sweep and the range of values for each parameter. The results of each simulation are stored in a ``*.geoh5`` file named with a unique identifier.
6+
The Plate Sweep module provides a user interface for generating and running a batch of simulations by sweeping one or more of the input parameters. The user can select which parameters to sweep and the range of values for each parameter.
77

8+
.. figure:: /plate-simulation/images/sweep/landing.png
9+
:align: center
10+
:width: 80%
11+
12+
The results of each simulation are stored in a ``*.geoh5`` file named with a unique identifier. A ``summary.xls`` file can be generated to track the parameters used in previous sweeps.
13+
14+
The following sections describe the user interface, inputs, and methodology of the Plate Sweep module.
815

916

1017
Interface
@@ -22,15 +29,16 @@ Inputs
2229
- **Plate simulation**: A Plate Simulation group that contains the input parameters for a single plate simulation, as well as the connection to a SimPEG Forward group. Parameters that are not included in the sweep will be taken from this group and used for all simulations.
2330
- **Output directory**: A directory where the results of each simulation will be stored. Each simulation will be saved in a separate ``*.geoh5`` file named with a unique identifier. The directory is created if it does not exist, otherwise simulations are appended to it.
2431
- **Generate summary file**: A boolean option to generate a summary file in the output directory. The summary file is a ``*.xls`` file that contains the input parameters and results of each simulation, allowing users to easily sort over the range of simulation parameters.
25-
- **Sweep block**: For each the following parameters, users can choose a **starting**, **ending** and **step** value to sweep over a range of values. The application will generate a simulation for each value in the range, while keeping all other parameters constant.
26-
- **Background**: Over-writing the
32+
- **[Sweep block]**: For every parameter of `Plate Simulation <plate_simulation>`_, users can choose a **starting**, **ending** and **step** value to sweep over a range of values. The application will generate a simulation for each value in the range, while keeping all other parameters constant. If a parameter is not included in the sweep, the value set in the input Plate Simulation group will be used for all simulations.
33+
2734

2835
Methodology
2936
-----------
3037

31-
Something
38+
This section provides a brief overview of the methodology used in the Plate Sweep module. For more details on the underlying algorithms and implementation, please refer to the source code.
39+
40+
After loading the input parameters from the Plate Simulation group, the application generates a list of parameter combinations based on the specified sweep ranges. For each combination of parameters, a unique identifier is generated using a hash system. If this unique identifier already exists in the ``Output director``, the simulation is skipped. Otherwise a copy of the original input ``geoh5`` is created. The target file is then opened and the Plate Simulation group is modified with the corresponding parameters before running the simulation. The results are saved in the target file, and the process is repeated for the next combination of parameters until all combinations have been processed.
3241

42+
If the dask.distributed library is enabled, the simulations are run in parallel using a local cluster. Otherwise, the simulations are run sequentially.
3343

34-
Tutorial
35-
--------
36-
To be added.
44+
Finally, if the option to generate a summary file is enabled, a routine extracts parameters from all ``*.geoh5`` files present in the ``Output director`` and tabulates them in a ``summary.xls`` file for easy reference and analysis.

0 commit comments

Comments
 (0)