Skip to content

Commit c7ca355

Browse files
committed
Merge branch 'develop' into GEOPY-2490
# Conflicts: # simpeg_drivers-assets/uijson/direct_current_2d_forward.ui.json # simpeg_drivers-assets/uijson/direct_current_2d_inversion.ui.json # simpeg_drivers-assets/uijson/direct_current_batch2d_inversion.ui.json # simpeg_drivers-assets/uijson/induced_polarization_2d_forward.ui.json # simpeg_drivers-assets/uijson/induced_polarization_2d_inversion.ui.json # simpeg_drivers-assets/uijson/induced_polarization_batch2d_inversion.ui.json
2 parents 3ff5a6a + b38bf2d commit c7ca355

94 files changed

Lines changed: 1458 additions & 940 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/_toc.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ chapters:
1414
- file: fundamentals/regularization/rotated_gradients
1515
- file: fundamentals/mesh_design
1616
- file: fundamentals/joint_inversion
17-
- file: fundamentals/depth_of_investigation
1817
- file: fundamentals/optimization
18+
- file: fundamentals/parallelization
1919
- file: tutorials/introduction
2020
sections:
2121
- file: tutorials/background
@@ -34,10 +34,10 @@ chapters:
3434
- file: case_studies/Forrestania/python_code/unconstrained_gravity_inv_training
3535
- file: case_studies/Forrestania/python_code/unconstrained_magnetics_inv_training
3636
- file: case_studies/Forrestania/python_code/joint_grav_mag
37-
- file: plate-simulation/index
37+
- file: fundamentals/depth_of_investigation
38+
- file: plate-simulation/simulation
3839
sections:
39-
- file: plate-simulation/usage
40-
title: Basic Usage
41-
- file: plate-simulation/methodology
42-
title: Methodology
40+
- file: plate-simulation/standalone
41+
- file: plate-simulation/sweep
42+
- file: plate-simulation/match
4343
- file: THIRD_PARTY_SOFTWARE

docs/conf.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@
4242
exclude_patterns = []
4343
todo_include_todos = True
4444

45+
4546
# -- Options for auto-doc ----------------------------------------------------
4647
# https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#module-sphinx.ext.autodoc
4748

docs/fundamentals/depth_of_investigation.ipynb

Lines changed: 57 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,66 @@
77
"source": [
88
"# Depth of Investigation\n",
99
"\n",
10-
"In geophysics, \"depth of investigation\" refers to the maximum depth below the surface from which a geophysical survey can reliably measure. It depends on factors like the survey design and physical properties of the subsurface material. Several strategies have been proposed to assess uncertainties in models recovered from inversion. {cite:t}`nabighian_1989` used a skin depth approach for electromagnetic surveys, assuming a background halfspace resistivity. {cite:t}`li_1999` implemented a cut-off value based on two inverted models obtained with slightly different assumptions. {cite:t}`christiansen_2012` proposed a mask based on the sum-square of sensitivities to estimate a volume of low confidence. In the following, we discuss the algorithm and implementation of the sensitivity cutoff strategy."
10+
"This application allows users to compute a depth of investigation from model values. In geophysics, \"depth of investigation\" refers to the maximum depth below the surface from which a geophysical survey can reliably measure. It depends on factors like the survey design and physical properties of the subsurface material. \n",
11+
"\n",
12+
"![masked_model](./images/masked_model.png)\n",
13+
"\n",
14+
"Several strategies have been proposed to assess uncertainties in models recovered from inversion. {cite:t}`nabighian_1989` used a skin depth approach for electromagnetic surveys, assuming a background halfspace resistivity. {cite:t}`li_1999` implemented a cut-off value based on two inverted models obtained with slightly different assumptions. \n",
15+
"\n",
16+
"This application uses the method proposed by {cite:t}`christiansen_2012`, based on the sum-square of sensitivities to estimate a volume of low confidence."
17+
]
18+
},
19+
{
20+
"cell_type": "markdown",
21+
"id": "83b4ae8f-5509-40b1-98c2-39acefba438c",
22+
"metadata": {
23+
"jp-MarkdownHeadingCollapsed": true
24+
},
25+
"source": [
26+
"## Interface\n",
27+
"\n",
28+
"The depth of investigation methods based on sensitivity cutoffs relies on the ui.json interface. \n",
29+
"\n",
30+
"![interface](./images/uijson.png)\n",
31+
"\n",
32+
"### Inputs\n",
33+
"\n",
34+
"- **Mesh**: The mesh used for inversion\n",
35+
"- **Sensitivity**: Model selector for the sensitivities at a given iteration (see [Pre-requisite](pre-requisite))\n",
36+
"- **Cut-off**: Percentage value to threshold the sentivities, relative to the **Method** used.\n",
37+
"- **Method**: One of *percentile*, *percent* or *log percent*\n",
38+
"\n",
39+
"![cutoff_options](./images/cutoff_options.png)\n",
40+
"\n",
41+
"### Output\n",
42+
"\n",
43+
"After running, the application will create a masking attribute saved on the mesh.\n",
44+
"\n",
45+
"![mask](./images/sensitivity_mask.png)\n",
46+
"\n",
47+
"The mask can then be applied to any of the iterations to show only the cells that exceeded the sensitivity threshold.\n",
48+
"\n",
49+
"![apply_mask](./images/apply_mask.png)\n",
50+
"\n",
51+
"\n",
52+
"(pre-requisite)=\n",
53+
"### Pre-requisite\n",
54+
"\n",
55+
"In order to save the sensitivities during a SimPEG inversion, the 'Save sensitivities' option must be selected from the 'Optional parameters' tab of an inversion.\n",
56+
"\n",
57+
"![save_sensitivities](./images/save_sensitivities.png)\n",
58+
"\n",
59+
"This will result in a new model generated and saved into the computational mesh at each iteration.\n",
60+
"\n",
61+
"![sensitivity_models](./images/sensitivity_models.png)"
1162
]
1263
},
1364
{
1465
"cell_type": "markdown",
1566
"id": "c13ffcec-8d95-4ee9-876f-9aa92e2c534a",
1667
"metadata": {},
1768
"source": [
18-
"## Sensitivities\n",
69+
"## Methodology\n",
1970
"\n",
2071
"The sensitivity matrix is calculated as part of the optimization problem solved by SimPEG while inverting geophysical data. The sensitivity matrix represents the degree to which each predicted datum changes with respect to a perturbation in each model cell. It is given in matrix form by\n",
2172
"\n",
@@ -28,20 +79,20 @@
2879
"The depth of investigation mask is a property of the cells of the mesh only so the rows of the sensitivity matrix (data) are sum-square normalized as follows.\n",
2980
"\n",
3081
"$$\n",
31-
"\\mathbf{J}_{norm} = \\left[\\sum_{n=1}^{N}\\left(\\frac{\\mathbf{J}_{n:}}{w_n}\\right)^{2}\\right]^{(1/2)}\n",
82+
"\\|\\mathbf{j}\\| = \\left[\\sum_{n=1}^{N}\\left(\\frac{\\mathbf{J}_{n:}}{w_n}\\right)^{2}\\right]^{(1/2)}\n",
3283
"$$\n",
3384
"\n",
3485
"where $w_n$ are the data uncertainties associated with the $n^{th}$ datum.\n",
3586
"\n",
36-
"The resulting vector $J_{norm}$ can then be thought of as the degree to which the aggregate data changes due to a small perturbation in each model cell. The depth of investigation mask is then computed by thresholding those sensitivities"
87+
"The resulting vector $\\|\\mathbf{j}\\|$ can then be thought of as the degree to which the aggregate data changes due to a small perturbation in each model cell. The depth of investigation mask is then computed by thresholding those sensitivities"
3788
]
3889
},
3990
{
4091
"cell_type": "markdown",
4192
"id": "d8bb85c5-5e63-4617-ab6b-a7b6536c1d2c",
4293
"metadata": {},
4394
"source": [
44-
"## Thresholding\n",
95+
"### Thresholding\n",
4596
"\n",
4697
"The depth of investigation can be estimated by assigning a threshold on the sum-squared sensitivity vector. This can be done as a percentile, percentage, or log-percentage. In the percentile method, the mask is formed by eliminating all cells in which the sensitivity falls below the lowest $n$% of the number of data where $n$ is the chosen cutoff. In the percent method the data are transformed into a percentage of the largest value\n",
4798
"\n",
@@ -51,56 +102,6 @@
51102
"\n",
52103
"and the mask is formed by eliminating all cells in which the sensitivity falls below the lowest $n$% of the data values where $n$ is the chosen cutoff. Finally, the log-percent mask transforms the data into log-space before carrying out the percentage thresholding described above."
53104
]
54-
},
55-
{
56-
"cell_type": "markdown",
57-
"id": "83b4ae8f-5509-40b1-98c2-39acefba438c",
58-
"metadata": {},
59-
"source": [
60-
"## Usage\n",
61-
"\n",
62-
"The depth of investigation methods based on sensitivity cutoffs described above are exposed to Geoscience ANALYST Pro Geophysics users through a ui.json interface. In order to save the sensitivities during a SimPEG inversion, the 'Save sensitivities' option must be selected from the 'Optional parameters' tab of the SimPEG inversion ui.json window.\n",
63-
"\n",
64-
"![save_sensitivities](./images/save_sensitivities.png)\n",
65-
"\n",
66-
"This will result in a new model generated and saved into the computational mesh at each iteration.\n",
67-
"\n",
68-
"![sensitivity_models](./images/sensitivity_models.png)\n",
69-
"\n",
70-
"The ui.json interface allows the user to select a mesh from the **simpeg-drivers** result and any of the generated sensitivity models, a cutoff threshold, method, and optional name for the output.\n",
71-
"\n",
72-
"![interface](./images/uijson.png)\n",
73-
"\n",
74-
"The cutoff methods are selectable in the dropdown\n",
75-
"\n",
76-
"![cutoff_options](./images/cutoff_options.png)\n",
77-
"\n",
78-
"Whatever the options, the application will create a sensitivity cutoff mask saved on the mesh\n",
79-
"\n",
80-
"![mask](./images/sensitivity_mask.png)\n",
81-
"\n",
82-
"which can then be applied to any of the iterations to show only the cells that exceeded the sensitivity threshold.\n",
83-
"\n",
84-
"![apply_mask](./images/apply_mask.png)\n",
85-
"\n",
86-
"![masked_model](./images/masked_model.png)"
87-
]
88-
},
89-
{
90-
"cell_type": "code",
91-
"execution_count": null,
92-
"id": "1079da0d-4ea2-4f00-b957-92199dd39cdb",
93-
"metadata": {},
94-
"outputs": [],
95-
"source": []
96-
},
97-
{
98-
"cell_type": "code",
99-
"execution_count": null,
100-
"id": "920f0637-4028-40c7-b2c6-5976f5e90afd",
101-
"metadata": {},
102-
"outputs": [],
103-
"source": []
104105
}
105106
],
106107
"metadata": {
@@ -119,7 +120,7 @@
119120
"name": "python",
120121
"nbconvert_exporter": "python",
121122
"pygments_lexer": "ipython3",
122-
"version": "3.10.16"
123+
"version": "3.10.19"
123124
}
124125
},
125126
"nbformat": 4,

docs/fundamentals/images/distributed_parallelization.svg

Lines changed: 1 addition & 0 deletions
Loading
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
.. _parallelization:
2+
3+
Parallelization
4+
===============
5+
6+
This section describes the different levels of parallelization used by the inversion routines and how to optimize resources.
7+
For a given inversion routine, the application decomposes the problem into a series of sub-problems, or tiles, each assigned a mesh and a survey. During the inversion process, the application continuously requests predicted data and derivatives from the sub-problems. The application parallelizes these operations within each sub-problem, as well as externally so that sub-problems can be computed concurrently.
8+
9+
10+
.. figure:: ./images/distributed_parallelization.svg
11+
:align: center
12+
:width: 80%
13+
14+
Schematic representation of the computing elements of a tiled inversion. Each tile receives an assigned mesh and a survey (single-frequency if applicable), with array operations parallelized by dask. The dask operations always bookend the direct solver when employed. Optionally, the workload can be distributed across multiple workers to optimize performance. Each worker is responsible for a sub-set of tiles and with a limited number of threads. Only 1-dimensional arrays are passed between the main process and the workers.
15+
16+
17+
Dask
18+
----
19+
20+
The `dask <https://www.dask.org/>`_ library handles most operations related to generating arrays. A mixture of ``dask.arrays`` and ``dask.delayed`` calls parallelizes the computations across multiple threads. If a direct solver is involved, the dask operations bookend the solver to avoid thread-safety issues. The application converts dask.arrays to numpy arrays before passing them to the direct solvers and before returning them to the main process. Only 1-dimensional arrays are returned to the main process, while higher-dimensional arrays and solvers remain in the distributed memory of the workers.
21+
22+
Sensitivity matrices can optionally be stored on disk using the `zarr <https://zarr.readthedocs.io/en/stable/>`_ library, which is optimized for parallel read/write access. In this case, the workers never hold the entire sensitivity matrix in memory, but rather read and write small chunks of the matrix as needed. This allows for efficient handling of large sensitivity matrices that may not fit in memory, at the cost of increased disk I/O. The use of zarr is optional and can be enabled by setting the ``store_sensitivity`` parameter to true in the ui.json file.
23+
24+
25+
Direct Solvers
26+
--------------
27+
28+
A direct solver is used for any method evaluated by partial differential equation (PDE), such as electromagnetics and electric surveys. The `Pardiso <https://github.com/simpeg/pydiso>`_ and `Mumps <https://gitlab.kwant-project.org/kwant/python-mumps>`_ solvers parallelize operations during the factorization and backward substitution calls. Note that the current implementation of the solvers is not thread-safe and therefore cannot be shared across parallel processes. Any other levels of parallelization need to occur outside the direct solver calls or be encapsulated within a distributed process.
29+
30+
The number of threads used by the solvers can be set by running the command
31+
32+
.. code-block::
33+
34+
set OMP_NUM_THREADS=X
35+
36+
before launching the python program. Alternatively, setting ``OMP_NUM_THREADS`` as a local environment variable will set it permanently. The default value is the number of threads available on the machine.
37+
38+
39+
.. _parallelization_distributed:
40+
41+
Dask.distributed
42+
----------------
43+
44+
It has been found that parallel processes, both for dask.delayed and the direct solver, tend to saturate on large numbers of threads. Too many small tasks tend to overwhelm the scheduler at the cost of performance. This can be alleviated by resorting to the ``dask.distributed`` library to split the computation of tiles across multiple ``workers``. Each worker can be responsible for a subset of the tiles, and can be configured to use a limited number of threads to optimize performance. This allows for better resource utilization and improved performance, especially when dealing with large problems that require significant computational resources. The number of workers and threads (per worker) can be set with the following parameters added to the ui.json file:
45+
46+
.. code-block::
47+
48+
{
49+
...
50+
"n_workers": i,
51+
"n_threads": n,
52+
"performance_report": true
53+
}
54+
55+
Where ``n_workers`` is the number of processes to spawn, and ``n_threads`` is the number of threads to use for each process. Setting ``performance_report`` to true will generate an ``html`` performance report at the end of the inversion, which can be used to identify bottlenecks and optimize the parallelization settings.
56+
57+
It is good practice to set an even number of threads per worker to optimize the load. Setting too many workers with too few threads can lead to increased overhead from inter-process communication, while setting too few workers with too many threads can lead to saturation of the direct solvers and reduced performance. For example, if the machine has 32 threads available, setting 4 workers with 8 threads each will fully use the resources.
58+
59+
It is also recommended to set the number of workers as a multiple of the number of tiles, to ensure that all workers are utilized. For example, if there are 8 tiles, setting 4 workers will allow each worker to process 2 tiles concurrently. If fewer tiles than workers are available, the program will automatically split surveys into smaller chunks, while preserving the mesh, to ensure even load across the workers. This is less efficient than having a dedicated optimized mesh per tile, but will still provide performance benefits.

docs/images/plate-simulation/basic_usage/analyst_geophysics_menu.png renamed to docs/images/analyst_geophysics_menu.png

File renamed without changes.

docs/intro.md

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,45 @@
11
# About
22

3-
This document contains training material for geophysical inversions using [SimPEG](https://simpeg.xyz/) and [Geoscience ANALYST](https://www.mirageoscience.com/mining-industry-software/geoscience-analyst/).
3+
This document contains training material for geophysical forward modeling and inversions using [SimPEG](https://simpeg.xyz/) and [Geoscience ANALYST](https://www.mirageoscience.com/mining-industry-software/geoscience-analyst/).
44
```{image} ./images/ore_body.png
55
:width: 500px
66
```
77

8-
# Table of content
8+
9+
# Table of contents
910

1011
```{tableofcontents}
1112
```
1213

1314

15+
# Running the applications
16+
17+
The main entry point to the various modules are the [*.ui.json](https://github.com/MiraGeoscience/simpeg-drivers/blob/develop/simpeg_drivers-assets/uijson)
18+
files. The ``ui.json`` serves a dual purpose:
19+
20+
(1) rendering a user-interface in Geoscience ANALYST
21+
22+
(2) storing the input parameters chosen by the user for the program to run. See the [UIJson documentation](https://mirageoscience-geoh5py.readthedocs-hosted.com/en/latest/content/uijson_format/usage.html)
23+
for more information about the ui.json interface.
24+
25+
The various user-interfaces can be accessed from the Geoscience ANALYST Pro Geophysics menu.
26+
27+
```{image} ./images/analyst_geophysics_menu.png
28+
:width: 500px
29+
```
30+
31+
The application can also be run from command line if all required fields in
32+
the ui.json are provided. This approach is useful for advanced users who want to
33+
automate the mesh creation process or re-run an existing mesh with different parameters.
34+
35+
To run any of the applications, assuming a valid Conda environment, use the following command:
36+
37+
``conda activate simpeg-drivers``
38+
39+
``python -m simpeg-drivers.driver [YOUR.ui.json]``
40+
41+
where ``[YOUR.ui.json]`` is the path to the input ui.json file on disk.
42+
1443

1544
# References
1645

227 KB
Loading
404 KB
Loading

0 commit comments

Comments
 (0)