Skip to content

Commit 6ab55ac

Browse files
authored
Merge pull request #1531 from Libensemble/examples/gpcam_notebook
Examples/gpcam notebook
2 parents 8c851b5 + f276392 commit 6ab55ac

12 files changed

Lines changed: 894 additions & 71 deletions

File tree

README.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,8 +95,10 @@ Try some other examples live in Colab.
9595
+---------------------------------------------------------------+-------------------------------------+
9696
| Optimization example that finds multiple minima. | |Optimization example| |
9797
+---------------------------------------------------------------+-------------------------------------+
98+
| Surrogate model generation with gpCAM. | |Surrogate Modeling| |
99+
+---------------------------------------------------------------+-------------------------------------+
98100

99-
There are many more examples in the `Community Examples repository`_ and `regression tests`_.
101+
There are many more examples in the `regression tests`_ and `Community Examples repository`_.
100102

101103
Resources
102104
=========
@@ -177,3 +179,6 @@ Resources
177179

178180
.. |Optimization example| image:: https://colab.research.google.com/assets/colab-badge.svg
179181
:target: http://colab.research.google.com/github/Libensemble/libensemble/blob/develop/examples/tutorials/aposmm/aposmm_tutorial_notebook.ipynb
182+
183+
.. |Surrogate Modeling| image:: https://colab.research.google.com/assets/colab-badge.svg
184+
:target: https://colab.research.google.com/github/Libensemble/libensemble/blob/develop/examples/tutorials/gpcam_surrogate_model/gpcam.ipynb
38.3 KB
Loading

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
tutorials/local_sine_tutorial
2424
tutorials/executor_forces_tutorial
2525
tutorials/forces_gpu_tutorial
26+
tutorials/gpcam_tutorial
2627
tutorials/aposmm_tutorial
2728
tutorials/calib_cancel_tutorial
2829

docs/tutorials/gpcam_tutorial.rst

Lines changed: 340 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,340 @@
1+
Surrogate Modeling with gpCAM
2+
=============================
3+
4+
This example uses gpCAM_ to construct a global surrogate of ``f`` values using a Gaussian process.
5+
6+
In each iteration, a batch of points is produced for concurrent evaluation, maximizing uncertainty reduction.
7+
8+
|Open in Colab|
9+
10+
Ensure that libEnsemble, and gpCAM are installed via: ``pip install libensemble gpcam``
11+
12+
Generator function
13+
-----------------
14+
15+
The gpCAM generator function is called ``persistent_gpCAM``.
16+
17+
This persistent generator is started at the beginning of the Ensemble and runs until the Ensemble closes down.
18+
19+
This version (and others) of the gpCAM generator can be found at `libensemble/gen_funcs/persistent_gpCAM.py <https://github.com/Libensemble/libensemble/blob/main/libensemble/gen_funcs/persistent_gpCAM.py>`_ and can be imported from that location when libEnsemble is installed as follows:
20+
21+
``from libensemble.gen_funcs.persistent_gpCAM import persistent_gpCAM``
22+
23+
.. code-block:: python
24+
25+
import numpy as np
26+
from numpy.lib.recfunctions import repack_fields
27+
from gpcam import GPOptimizer as GP
28+
29+
# Standard options for manager/generator comms
30+
from libensemble.message_numbers import EVAL_GEN_TAG, FINISHED_PERSISTENT_GEN_TAG, PERSIS_STOP, STOP_TAG
31+
from libensemble.tools.persistent_support import PersistentSupport
32+
33+
def persistent_gpCAM(H_in, persis_info, gen_specs, libE_info):
34+
"""Run a batched gpCAM model to create a surrogate"""
35+
36+
# Initialize
37+
rng, batch_size, n, lb, ub, x_new, y_new, ps = _initialize_gpcAM(gen_specs["user"], libE_info)
38+
ask_max_iter = gen_specs["user"].get("ask_max_iter") or 10
39+
test_points = _read_testpoints(gen_specs["user"])
40+
noise = 1e-8 # Initializes noise
41+
my_gp = None
42+
43+
# Start with a batch of random points
44+
x_new = rng.uniform(lb, ub, (batch_size, n))
45+
H_o = np.zeros(batch_size, dtype=gen_specs["out"])
46+
H_o["x"] = x_new
47+
tag, Work, calc_in = ps.send_recv(H_o) # Send random points for evaluation and wait for results
48+
49+
while tag not in [STOP_TAG, PERSIS_STOP]:
50+
y_new = np.atleast_2d(calc_in["f"]).T
51+
my_gp = _update_gp(my_gp, x_new, y_new, test_points, persis_info, noise)
52+
53+
# Request new points
54+
x_new = my_gp.ask(
55+
input_set=np.column_stack((lb, ub)),
56+
n=batch_size,
57+
pop_size=batch_size,
58+
acquisition_function="total correlation",
59+
max_iter=ask_max_iter, # Larger takes longer. gpCAM default is 20.
60+
)["x"]
61+
62+
H_o = np.zeros(batch_size, dtype=gen_specs["out"])
63+
H_o["x"] = x_new
64+
tag, Work, calc_in = ps.send_recv(H_o) # Send points for evaluation and wait for results
65+
66+
# If final points were returned update the model
67+
if calc_in is not None:
68+
y_new = np.atleast_2d(calc_in["f"]).T
69+
my_gp = _update_gp(my_gp, x_new, y_new, test_points, persis_info, noise)
70+
71+
return None, persis_info, FINISHED_PERSISTENT_GEN_TAG
72+
73+
Common acquisition functions include:
74+
75+
**Uncertainty reduction:**
76+
77+
- **"variance"** (default): The optimizer will produce N best points.
78+
- **"total correlation"**: More expensive but points produced are self-avoiding.
79+
80+
**Bayesian optimization:**
81+
82+
These produce one point at a time unless using the `HGDL <https://ieeexplore.ieee.org/abstract/document/9652812>`_ option.
83+
84+
- **"ucb" / "lcb"**: Upper/Lower Confidence Bound.
85+
- **"expected improvement"**: Expected Improvement.
86+
87+
For more options see: https://gpcam.lbl.gov/examples/acquisition-functions
88+
89+
----
90+
91+
The following code adds the functions used by ``persistent_gpCAM``.
92+
93+
``_update_gp`` is where the GP is fed the data and trained.
94+
95+
.. code-block:: python
96+
97+
def _initialize_gpcAM(user_specs, libE_info):
98+
"""Extract user params"""
99+
rng_seed = user_specs.get("rng_seed") # will default to None
100+
rng = np.random.default_rng(rng_seed) # Create random stream
101+
b = user_specs["batch_size"]
102+
lb = np.array(user_specs["lb"])
103+
ub = np.array(user_specs["ub"])
104+
n = len(lb) # no. of dimensions
105+
init_x = np.empty((0, n))
106+
init_y = np.empty((0, 1))
107+
ps = PersistentSupport(libE_info, EVAL_GEN_TAG) # init comms
108+
return rng, b, n, lb, ub, init_x, init_y, ps
109+
110+
111+
def _read_testpoints(U):
112+
"""Read numpy file containing evaluated points for measuring GP error"""
113+
test_points_file = U.get("test_points_file")
114+
if test_points_file is None:
115+
return None
116+
test_points = np.load(test_points_file)
117+
test_points = repack_fields(test_points[["x", "f"]])
118+
return test_points
119+
120+
121+
def _compare_testpoints(my_gp, test_points, persis_info):
122+
"""Compare model at test points"""
123+
if test_points is None:
124+
return
125+
f_est = my_gp.posterior_mean(test_points["x"])["f(x)"]
126+
mse = np.mean((f_est - test_points["f"]) ** 2)
127+
persis_info.setdefault("mean_squared_error", []).append(float(mse))
128+
129+
130+
def _update_gp(my_gp, x_new, y_new, test_points, persis_info, noise):
131+
"""Update Gaussian process with new points and train"""
132+
noise_arr = noise * np.ones(len(y_new)) # Initializes noise
133+
if my_gp is None:
134+
my_gp = GP(x_new, y_new.flatten(), noise_variances=noise_arr)
135+
else:
136+
my_gp.tell(x_new, y_new.flatten(), noise_variances=noise_arr, append=True)
137+
my_gp.train()
138+
139+
if test_points is not None:
140+
_compare_testpoints(my_gp, test_points, persis_info)
141+
142+
return my_gp
143+
144+
Simulator function
145+
------------------
146+
147+
Simulator functions or ``sim_f``\ s perform calculations based on parameters created in the generator function.
148+
Each worker runs a copy of this function in parallel.
149+
150+
The function here is the simple 2D ``six_hump_camel``, for demonstration purposes.
151+
152+
For running applications using parallel resources in the simulator see the `forces examples <https://github.com/Libensemble/libensemble/tree/main/libensemble/tests/scaling_tests/forces/forces_simple>`_.
153+
154+
.. code-block:: python
155+
156+
# Define our simulation function
157+
import numpy as np
158+
159+
def six_hump_camel(H, persis_info, sim_specs, _):
160+
"""Six-Hump Camel sim_f."""
161+
162+
batch = len(H["x"]) # Num evaluations each sim_f call.
163+
H_o = np.zeros(batch, dtype=sim_specs["out"]) # Define output array H
164+
165+
for i, x in enumerate(H["x"]):
166+
H_o["f"][i] = six_hump_camel_func(x) # Function evaluations placed into H
167+
168+
return H_o, persis_info
169+
170+
171+
def six_hump_camel_func(x):
172+
"""Six-Hump Camel function definition"""
173+
x1 = x[0]
174+
x2 = x[1]
175+
term1 = (4 - 2.1 * x1**2 + (x1**4) / 3) * x1**2
176+
term2 = x1 * x2
177+
term3 = (-4 + 4 * x2**2) * x2**2
178+
179+
return term1 + term2 + term3
180+
181+
Calling Script
182+
-------------
183+
184+
Our calling script configures libEnsemble, the generator function, and the simulator function. It then create the ensemble object and runs the ensemble.
185+
186+
First we will create a cleanup script so we can easily re-run.
187+
188+
.. code-block:: python
189+
190+
# To rerun this notebook, we need to delete the ensemble directory.
191+
import shutil
192+
def cleanup():
193+
try:
194+
shutil.rmtree("ensemble")
195+
except:
196+
pass
197+
198+
This calling script imports the Gen and Sim functions from the locations in the installed libensemble package.
199+
If you wish to make your own functions based on the above, those can be imported instead.
200+
201+
.. code-block:: python
202+
203+
import numpy as np
204+
from pprint import pprint
205+
206+
from libensemble import Ensemble
207+
from libensemble.specs import LibeSpecs, GenSpecs, SimSpecs, AllocSpecs, ExitCriteria
208+
209+
# If importing from libensemble
210+
from libensemble.gen_funcs.persistent_gpCAM import persistent_gpCAM
211+
from libensemble.sim_funcs.six_hump_camel import six_hump_camel
212+
213+
from libensemble.alloc_funcs.start_only_persistent import only_persistent_gens
214+
import warnings
215+
216+
warnings.filterwarnings("ignore", message="Default hyperparameter_bounds")
217+
warnings.filterwarnings("ignore", message="Hyperparameters initialized")
218+
219+
nworkers = 4
220+
221+
# When using gen_on_manager, nworkers is number of concurrent sims.
222+
# final_gen_send means the last evaluated points are returned to the generator to update the model.
223+
libE_specs = LibeSpecs(nworkers=nworkers, gen_on_manager=True, final_gen_send=True)
224+
225+
n = 2 # Input dimensions
226+
batch_size = 4
227+
num_batches = 6
228+
229+
gen_specs = GenSpecs(
230+
gen_f=persistent_gpCAM, # Generator function
231+
persis_in=["f"], # Objective, defined in sim, is returned to gen
232+
outputs=[("x", float, (n,))], # Parameters (name, type, size)
233+
user={
234+
"batch_size": batch_size,
235+
"lb": np.array([-2, -1]), # lower boundaries for n dimensions
236+
"ub": np.array([2, 1]), # upper boundaries for n dimensions
237+
"ask_max_iter": 5, # Number of iterations for ask (default 20)
238+
"rng_seed": 0,
239+
},
240+
)
241+
242+
sim_specs = SimSpecs(
243+
sim_f=six_hump_camel, # Simulator function
244+
inputs=["x"], # Input field names. "x" defined in gen
245+
outputs=[("f", float)], # Objective
246+
)
247+
248+
# Starts one persistent generator. Simulated values are returned in batch.
249+
alloc_specs = AllocSpecs(
250+
alloc_f=only_persistent_gens,
251+
user={"async_return": False}, # False = batch returns
252+
)
253+
254+
exit_criteria = ExitCriteria(sim_max=num_batches*batch_size)
255+
256+
# Initialize and run the ensemble.
257+
ensemble = Ensemble(
258+
libE_specs=libE_specs,
259+
sim_specs=sim_specs,
260+
gen_specs=gen_specs,
261+
alloc_specs=alloc_specs,
262+
exit_criteria=exit_criteria,
263+
)
264+
265+
At the end of our calling script we run the ensemble.
266+
267+
.. code-block:: python
268+
269+
# To ensure re-running works - clean output and reset any persistent information
270+
cleanup()
271+
ensemble.persis_info = {}
272+
273+
H, persis_info, flag = ensemble.run() # Start the ensemble. Blocks until completion.
274+
ensemble.save_output("H_array", append_attrs=False) # Save H (history of all evaluated points) to file
275+
pprint(H[["sim_id", "x", "f"]][:16]) # See first 16 results
276+
277+
Rerun and test model at known points
278+
-----------------------------------
279+
280+
To see how the accuracy of the surrogate model improves, we can use previously evaluated points as test points and run again with a different seed.
281+
282+
.. code-block:: python
283+
284+
ensemble.gen_specs.user["rng_seed"] = 123
285+
ensemble.gen_specs.user["test_points_file"] = "H_array.npy" # our previous file
286+
287+
# To ensure re-running works - clean output and reset any persistent information
288+
cleanup()
289+
ensemble.persis_info = {}
290+
291+
H, persis_info, flag = ensemble.run()
292+
print(persis_info)
293+
294+
Viewing model progression
295+
------------------------
296+
297+
Now we can check how our model's values compared against the values at known test points as the ensemble progresses.
298+
The comparison is based on the **mean squared error** between the gpCAM model and our known
299+
values at the test points.
300+
301+
.. code-block:: python
302+
303+
import matplotlib
304+
import matplotlib.pyplot as plt
305+
306+
# Get "mean_squared_error" from generators return (worker 0 as we ran gen_on_manager)
307+
mse = persis_info[0]["mean_squared_error"]
308+
niter = len(mse)
309+
num_sims = list(range(batch_size, (niter * batch_size) + 1, batch_size))
310+
311+
# Plotting the data
312+
markersize = 10
313+
plt.figure(figsize=(10, 5))
314+
plt.plot(
315+
num_sims, mse, marker="^", markeredgecolor="black", markeredgewidth=2,
316+
markersize=markersize, linewidth=2, label="Mean squared error"
317+
)
318+
plt.xticks(num_sims)
319+
320+
# Labeling the axes and the legend
321+
plt.title('Mean Squared Error at test points')
322+
plt.xlabel("Number of simulations")
323+
plt.ylabel('Mean squared error (rad$^2$)')
324+
legend = plt.legend(framealpha=1, edgecolor="black") # Increase edge width here
325+
plt.grid(True)
326+
plt.show()
327+
328+
The plot should look similar to the following.
329+
330+
.. note::
331+
The graph may differ between runs because, although we seed libEnsemble's random number generator,
332+
gpCAM introduces some randomness when initializing hyperparameters.
333+
334+
.. image:: ../images/gpcam_model_improvement.png
335+
:alt: gpcam_model_improvement
336+
:align: center
337+
338+
.. _gpCAM: https://github.com/lbl-camera/gpCAM
339+
.. |Open in Colab| image:: https://colab.research.google.com/assets/colab-badge.svg
340+
:target: http://colab.research.google.com/github/Libensemble/libensemble/blob/develop/examples/tutorials/gpcam_surrogate_model/gpcam.ipynb

docs/tutorials/tutorials.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,6 @@ Tutorials
66
local_sine_tutorial
77
executor_forces_tutorial
88
forces_gpu_tutorial
9+
gpcam_tutorial
910
aposmm_tutorial
1011
calib_cancel_tutorial

0 commit comments

Comments
 (0)