Libensemble
diff --git a/‎README.rst‎
Lines changed: 6 additions & 1 deletion b/‎README.rst‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎docs/images/gpcam_model_improvement.png‎
38.3 KB b/‎docs/images/gpcam_model_improvement.png‎
38.3 KB
diff --git a/‎docs/index.rst‎
Lines changed: 1 addition & 0 deletions b/‎docs/index.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/tutorials/gpcam_tutorial.rst‎
Lines changed: 340 additions & 0 deletions b/‎docs/tutorials/gpcam_tutorial.rst‎
Lines changed: 340 additions & 0 deletions
diff --git a/‎docs/tutorials/tutorials.rst‎
Lines changed: 1 addition & 0 deletions b/‎docs/tutorials/tutorials.rst‎
Lines changed: 1 addition & 0 deletions
@@ -95,8 +95,10 @@ Try some other examples live in Colab.
 +---------------------------------------------------------------+-------------------------------------+
 | Optimization example that finds multiple minima.              | |Optimization example|              |
 +---------------------------------------------------------------+-------------------------------------+
+| Surrogate model generation with gpCAM.                        | |Surrogate Modeling|                |
++---------------------------------------------------------------+-------------------------------------+
 
-There are many more examples in the `Community Examples repository`_ and `regression tests`_.
+There are many more examples in the `regression tests`_ and `Community Examples repository`_.
 
 Resources
 =========
@@ -177,3 +179,6 @@ Resources
 
 .. |Optimization example| image:: https://colab.research.google.com/assets/colab-badge.svg
   :target:  http://colab.research.google.com/github/Libensemble/libensemble/blob/develop/examples/tutorials/aposmm/aposmm_tutorial_notebook.ipynb
+
+.. |Surrogate Modeling| image:: https://colab.research.google.com/assets/colab-badge.svg
+  :target:  https://colab.research.google.com/github/Libensemble/libensemble/blob/develop/examples/tutorials/gpcam_surrogate_model/gpcam.ipynb
@@ -23,6 +23,7 @@
    tutorials/local_sine_tutorial
    tutorials/executor_forces_tutorial
    tutorials/forces_gpu_tutorial
+   tutorials/gpcam_tutorial
    tutorials/aposmm_tutorial
    tutorials/calib_cancel_tutorial
 
 
@@ -0,0 +1,340 @@
+Surrogate Modeling with gpCAM
+=============================
+
+This example uses gpCAM_ to construct a global surrogate of ``f`` values using a Gaussian process.
+
+In each iteration, a batch of points is produced for concurrent evaluation, maximizing uncertainty reduction.
+
+|Open in Colab|
+
+Ensure that libEnsemble, and gpCAM are installed via: ``pip install libensemble gpcam``
+
+Generator function
+-----------------
+
+The gpCAM generator function is called ``persistent_gpCAM``.
+
+This persistent generator is started at the beginning of the Ensemble and runs until the Ensemble closes down.
+
+This version (and others) of the gpCAM generator can be found at `libensemble/gen_funcs/persistent_gpCAM.py <https://github.com/Libensemble/libensemble/blob/main/libensemble/gen_funcs/persistent_gpCAM.py>`_ and can be imported from that location when libEnsemble is installed as follows:
+
+``from libensemble.gen_funcs.persistent_gpCAM import persistent_gpCAM``
+
+.. code-block:: python
+
+    import numpy as np
+    from numpy.lib.recfunctions import repack_fields
+    from gpcam import GPOptimizer as GP
+
+    # Standard options for manager/generator comms
+    from libensemble.message_numbers import EVAL_GEN_TAG, FINISHED_PERSISTENT_GEN_TAG, PERSIS_STOP, STOP_TAG
+    from libensemble.tools.persistent_support import PersistentSupport
+
+    def persistent_gpCAM(H_in, persis_info, gen_specs, libE_info):
+        """Run a batched gpCAM model to create a surrogate"""
+
+        # Initialize
+        rng, batch_size, n, lb, ub, x_new, y_new, ps = _initialize_gpcAM(gen_specs["user"], libE_info)
+        ask_max_iter = gen_specs["user"].get("ask_max_iter") or 10
+        test_points = _read_testpoints(gen_specs["user"])
+        noise = 1e-8  # Initializes noise
+        my_gp = None
+
+        # Start with a batch of random points
+        x_new = rng.uniform(lb, ub, (batch_size, n))
+        H_o = np.zeros(batch_size, dtype=gen_specs["out"])
+        H_o["x"] = x_new
+        tag, Work, calc_in = ps.send_recv(H_o)  # Send random points for evaluation and wait for results
+
+        while tag not in [STOP_TAG, PERSIS_STOP]:
+            y_new = np.atleast_2d(calc_in["f"]).T
+            my_gp = _update_gp(my_gp, x_new, y_new, test_points, persis_info, noise)
+
+            # Request new points
+            x_new = my_gp.ask(
+                input_set=np.column_stack((lb, ub)),
+                n=batch_size,
+                pop_size=batch_size,
+                acquisition_function="total correlation",
+                max_iter=ask_max_iter,  # Larger takes longer. gpCAM default is 20.
+            )["x"]
+
+            H_o = np.zeros(batch_size, dtype=gen_specs["out"])
+            H_o["x"] = x_new
+            tag, Work, calc_in = ps.send_recv(H_o)  # Send points for evaluation and wait for results
+
+        # If final points were returned update the model
+        if calc_in is not None:
+            y_new = np.atleast_2d(calc_in["f"]).T
+            my_gp = _update_gp(my_gp, x_new, y_new, test_points, persis_info, noise)
+
+        return None, persis_info, FINISHED_PERSISTENT_GEN_TAG
+
+Common acquisition functions include:
+
+**Uncertainty reduction:**
+
+- **"variance"** (default): The optimizer will produce N best points.
+- **"total correlation"**: More expensive but points produced are self-avoiding.
+
+**Bayesian optimization:**
+
+These produce one point at a time unless using the `HGDL <https://ieeexplore.ieee.org/abstract/document/9652812>`_ option.
+
+- **"ucb" / "lcb"**: Upper/Lower Confidence Bound.
+- **"expected improvement"**: Expected Improvement.
+
+For more options see: https://gpcam.lbl.gov/examples/acquisition-functions
+
+----
+
+The following code adds the functions used by ``persistent_gpCAM``.
+
+``_update_gp`` is where the GP is fed the data and trained.
+
+.. code-block:: python
+
+    def _initialize_gpcAM(user_specs, libE_info):
+        """Extract user params"""
+        rng_seed = user_specs.get("rng_seed")  # will default to None
+        rng = np.random.default_rng(rng_seed)  # Create random stream
+        b = user_specs["batch_size"]
+        lb = np.array(user_specs["lb"])
+        ub = np.array(user_specs["ub"])
+        n = len(lb)  # no. of dimensions
+        init_x = np.empty((0, n))
+        init_y = np.empty((0, 1))
+        ps = PersistentSupport(libE_info, EVAL_GEN_TAG)  # init comms
+        return rng, b, n, lb, ub, init_x, init_y, ps
+
+
+    def _read_testpoints(U):
+        """Read numpy file containing evaluated points for measuring GP error"""
+        test_points_file = U.get("test_points_file")
+        if test_points_file is None:
+            return None
+        test_points = np.load(test_points_file)
+        test_points = repack_fields(test_points[["x", "f"]])
+        return test_points
+
+
+    def _compare_testpoints(my_gp, test_points, persis_info):
+        """Compare model at test points"""
+        if test_points is None:
+            return
+        f_est = my_gp.posterior_mean(test_points["x"])["f(x)"]
+        mse = np.mean((f_est - test_points["f"]) ** 2)
+        persis_info.setdefault("mean_squared_error", []).append(float(mse))
+
+
+    def _update_gp(my_gp, x_new, y_new, test_points, persis_info, noise):
+        """Update Gaussian process with new points and train"""
+        noise_arr = noise * np.ones(len(y_new))  # Initializes noise
+        if my_gp is None:
+            my_gp = GP(x_new, y_new.flatten(), noise_variances=noise_arr)
+        else:
+            my_gp.tell(x_new, y_new.flatten(), noise_variances=noise_arr, append=True)
+        my_gp.train()
+
+        if test_points is not None:
+            _compare_testpoints(my_gp, test_points, persis_info)
+
+        return my_gp
+
+Simulator function
+------------------
+
+Simulator functions or ``sim_f``\ s perform calculations based on parameters created in the generator function.
+Each worker runs a copy of this function in parallel.
+
+The function here is the simple 2D ``six_hump_camel``, for demonstration purposes.
+
+For running applications using parallel resources in the simulator see the `forces examples <https://github.com/Libensemble/libensemble/tree/main/libensemble/tests/scaling_tests/forces/forces_simple>`_.
+
+.. code-block:: python
+
+    # Define our simulation function
+    import numpy as np
+
+    def six_hump_camel(H, persis_info, sim_specs, _):
+        """Six-Hump Camel sim_f."""
+
+        batch = len(H["x"])  # Num evaluations each sim_f call.
+        H_o = np.zeros(batch, dtype=sim_specs["out"])  # Define output array H
+
+        for i, x in enumerate(H["x"]):
+            H_o["f"][i] = six_hump_camel_func(x)  # Function evaluations placed into H
+
+        return H_o, persis_info
+
+
+    def six_hump_camel_func(x):
+        """Six-Hump Camel function definition"""
+        x1 = x[0]
+        x2 = x[1]
+        term1 = (4 - 2.1 * x1**2 + (x1**4) / 3) * x1**2
+        term2 = x1 * x2
+        term3 = (-4 + 4 * x2**2) * x2**2
+
+        return term1 + term2 + term3
+
+Calling Script
+-------------
+
+Our calling script configures libEnsemble, the generator function, and the simulator function. It then create the ensemble object and runs the ensemble.
+
+First we will create a cleanup script so we can easily re-run.
+
+.. code-block:: python
+
+    # To rerun this notebook, we need to delete the ensemble directory.
+    import shutil
+    def cleanup():
+        try:
+            shutil.rmtree("ensemble")
+        except:
+            pass
+
+This calling script imports the Gen and Sim functions from the locations in the installed libensemble package.
+If you wish to make your own functions based on the above, those can be imported instead.
+
+.. code-block:: python
+
+    import numpy as np
+    from pprint import pprint
+
+    from libensemble import Ensemble
+    from libensemble.specs import LibeSpecs, GenSpecs, SimSpecs, AllocSpecs, ExitCriteria
+
+    # If importing from libensemble
+    from libensemble.gen_funcs.persistent_gpCAM import persistent_gpCAM
+    from libensemble.sim_funcs.six_hump_camel import six_hump_camel
+
+    from libensemble.alloc_funcs.start_only_persistent import only_persistent_gens
+    import warnings
+
+    warnings.filterwarnings("ignore", message="Default hyperparameter_bounds")
+    warnings.filterwarnings("ignore", message="Hyperparameters initialized")
+
+    nworkers = 4
+
+    # When using gen_on_manager, nworkers is number of concurrent sims.
+    # final_gen_send means the last evaluated points are returned to the generator to update the model.
+    libE_specs = LibeSpecs(nworkers=nworkers, gen_on_manager=True, final_gen_send=True)
+
+    n = 2  # Input dimensions
+    batch_size = 4
+    num_batches = 6
+
+    gen_specs = GenSpecs(
+        gen_f=persistent_gpCAM,        # Generator function
+        persis_in=["f"],               # Objective, defined in sim, is returned to gen
+        outputs=[("x", float, (n,))],  # Parameters (name, type, size)
+        user={
+            "batch_size": batch_size,
+            "lb": np.array([-2, -1]),  # lower boundaries for n dimensions
+            "ub": np.array([2, 1]),    # upper boundaries for n dimensions
+            "ask_max_iter": 5,         # Number of iterations for ask (default 20)
+            "rng_seed": 0,
+        },
+    )
+
+    sim_specs = SimSpecs(
+        sim_f=six_hump_camel,      # Simulator function
+        inputs=["x"],              # Input field names. "x" defined in gen
+        outputs=[("f", float)],    # Objective
+    )
+
+    # Starts one persistent generator. Simulated values are returned in batch.
+    alloc_specs = AllocSpecs(
+        alloc_f=only_persistent_gens,
+        user={"async_return": False},  # False = batch returns
+    )
+
+    exit_criteria = ExitCriteria(sim_max=num_batches*batch_size)
+
+    # Initialize and run the ensemble.
+    ensemble = Ensemble(
+        libE_specs=libE_specs,
+        sim_specs=sim_specs,
+        gen_specs=gen_specs,
+        alloc_specs=alloc_specs,
+        exit_criteria=exit_criteria,
+    )
+
+At the end of our calling script we run the ensemble.
+
+.. code-block:: python
+
+    # To ensure re-running works - clean output and reset any persistent information
+    cleanup()
+    ensemble.persis_info = {}
+
+    H, persis_info, flag = ensemble.run()  # Start the ensemble. Blocks until completion.
+    ensemble.save_output("H_array", append_attrs=False)  # Save H (history of all evaluated points) to file
+    pprint(H[["sim_id", "x", "f"]][:16]) # See first 16 results
+
+Rerun and test model at known points
+-----------------------------------
+
+To see how the accuracy of the surrogate model improves, we can use previously evaluated points as test points and run again with a different seed.
+
+.. code-block:: python
+
+    ensemble.gen_specs.user["rng_seed"] = 123
+    ensemble.gen_specs.user["test_points_file"] = "H_array.npy"  # our previous file
+
+    # To ensure re-running works - clean output and reset any persistent information
+    cleanup()
+    ensemble.persis_info = {}
+
+    H, persis_info, flag = ensemble.run()
+    print(persis_info)
+
+Viewing model progression
+------------------------
+
+Now we can check how our model's values compared against the values at known test points as the ensemble progresses.
+The comparison is based on the **mean squared error** between the gpCAM model and our known
+values at the test points.
+
+.. code-block:: python
+
+    import matplotlib
+    import matplotlib.pyplot as plt
+
+    # Get "mean_squared_error" from generators return (worker 0 as we ran gen_on_manager)
+    mse = persis_info[0]["mean_squared_error"]
+    niter = len(mse)
+    num_sims = list(range(batch_size, (niter * batch_size) + 1, batch_size))
+
+    # Plotting the data
+    markersize = 10
+    plt.figure(figsize=(10, 5))
+    plt.plot(
+        num_sims, mse, marker="^", markeredgecolor="black", markeredgewidth=2,
+        markersize=markersize, linewidth=2, label="Mean squared error"
+    )
+    plt.xticks(num_sims)
+
+    # Labeling the axes and the legend
+    plt.title('Mean Squared Error at test points')
+    plt.xlabel("Number of simulations")
+    plt.ylabel('Mean squared error (rad$^2$)')
+    legend = plt.legend(framealpha=1, edgecolor="black")  # Increase edge width here
+    plt.grid(True)
+    plt.show()
+
+The plot should look similar to the following.
+
+.. note::
+   The graph may differ between runs because, although we seed libEnsemble's random number generator,
+   gpCAM introduces some randomness when initializing hyperparameters.
+
+.. image:: ../images/gpcam_model_improvement.png
+    :alt: gpcam_model_improvement
+    :align: center
+
+.. _gpCAM: https://github.com/lbl-camera/gpCAM
+.. |Open in Colab| image:: https://colab.research.google.com/assets/colab-badge.svg
+  :target:  http://colab.research.google.com/github/Libensemble/libensemble/blob/develop/examples/tutorials/gpcam_surrogate_model/gpcam.ipynb
@@ -6,5 +6,6 @@ Tutorials
    local_sine_tutorial
    executor_forces_tutorial
    forces_gpu_tutorial
+   gpcam_tutorial
    aposmm_tutorial
    calib_cancel_tutorial