From ea4bbf0d6258267a35fcb25738df6407c6d70d45 Mon Sep 17 00:00:00 2001
From: shudson <shudson@anl.gov>
Date: Thu, 9 Apr 2026 01:35:08 -0500
Subject: [PATCH] Add generate_scripts skill

---
 .claude/skills/generate-scripts/SKILL.md      | 147 ++++++++++++++++++
 .../generate-scripts/references/aposmm.md     | 146 +++++++++++++++++
 .../references/finding_objectives.md          |  31 ++++
 .../generate-scripts/references/generators.md |  81 ++++++++++
 .../references/results_metadata.md            |  42 +++++
 5 files changed, 447 insertions(+)
 create mode 100644 .claude/skills/generate-scripts/SKILL.md
 create mode 100644 .claude/skills/generate-scripts/references/aposmm.md
 create mode 100644 .claude/skills/generate-scripts/references/finding_objectives.md
 create mode 100644 .claude/skills/generate-scripts/references/generators.md
 create mode 100644 .claude/skills/generate-scripts/references/results_metadata.md

diff --git a/.claude/skills/generate-scripts/SKILL.md b/.claude/skills/generate-scripts/SKILL.md
new file mode 100644
index 000000000..ebdd893cc
--- /dev/null
+++ b/.claude/skills/generate-scripts/SKILL.md
@@ -0,0 +1,147 @@
+---
+name: generate-scripts
+description: Generate libEnsemble calling scripts based on user requirements
+---
+
+You are generating libEnsemble scripts. libEnsemble coordinates parallel simulations
+with generator-directed optimization or sampling. You will produce a calling script
+and, when an external application is involved, a sim function file.
+
+libEnsemble repository: https://github.com/Libensemble/libensemble
+If not running inside the libEnsemble repo, find examples and source code there.
+
+## Workflow
+
+1. If converting an existing Xopt or Optimas workflow to libEnsemble, use the
+   existing generator and VOCS settings exactly as-is — even if it is a sampling
+   or exploration generator. Do not switch to a classic generator unless the user
+   specifically asks.
+   Otherwise, if there is not a clear generator to use, read `references/generators.md`
+   to determine which generator and
+   style to use. If a specific generator is identified (e.g., APOSMM), read its
+   dedicated guide (e.g., `references/aposmm.md`).
+
+2. Find a relevant example in `libensemble/tests/regression_tests/` and read it as a
+   reference. Some examples:
+   - Xopt Bayesian optimization (VOCS): `test_xopt_EI_initial_sample.py` — best Xopt
+     example as it demonstrates the initial sampling approach Xopt generators need
+   - Optimas Ax optimization (VOCS): `test_optimas_ax_sf.py`
+   - APOSMM with NLopt (classic): `test_persistent_aposmm_nlopt.py`
+   - Random uniform sampling (classic): `test_1d_sampling.py`
+   Use glob and grep to find others matching the generator or pattern needed.
+   The regression tests have clear descriptions in the docstring.
+
+3. Write the calling script adapting the example to the user's requirements.
+   Do not copy test boilerplate from examples
+   (e.g., "Execute via one of the following commands..." headers). Set nworkers
+   directly in the script (in LibeSpecs) — do not use parse_args or command-line
+   arguments unless the user asks for that. If parse_args is not used and no
+   options are taken, then do not ever suggest running with "-n/nworkers" or comms.
+   Those optins are used only with parse_args (used in tests).
+
+4. If the user has an external application (executable), also write a sim function file
+   that uses the executor to run it.
+
+5. If the user provides an input file, check whether it has Jinja2 template markers
+   (`{{ varname }}`). If not, create a templated copy: replace parameter values with
+   `{{ name }}` markers matching `input_names` in sim_specs (case-sensitive). The sim
+   function uses `jinja2.Template` to render the file before each simulation. Never
+   modify the user's original file.
+
+6. Verify the scripts:
+   - Bounds and dimension match the user's request
+   - Executable path is correct
+   - For VOCS: variable names are consistent between VOCS definition and sim function
+   - For APOSMM: gen_specs outputs include all required fields
+   - Input file template markers match input_names (case-sensitive)
+   - The app_name in submit() matches register_app()
+
+7. Present a concise summary highlighting: generator choice, bounds, parameters,
+   sim_max, and objective field. Do NOT suggest `mpirun` or other MPI
+   runner (srun, mpiexec, etc.) to launch libEnsemble unless the user explicitly
+   asks for MPI-based comms.
+
+8. Ask the user if they want to run the scripts.
+
+9. If running: execute with `python script.py`. Do not use `mpirun` or other MPI
+   runner (srun, mpiexec, etc.) to launch libEnsemble unless the user explicitly
+   asks for MPI-based comms for distributing workers. This is unrelated to
+   MPIExecutor, which workers use to launch simulation applications across nodes
+   — libEnsemble manages node allocation.
+   If scripts fail, retry if you can see a fix, otherwise stop. After a successful
+   run, read `references/results_metadata.md` and
+   `references/finding_objectives.md` to interpret the output.
+
+## Generator style
+
+VOCS (gest-api) is the default style. It uses a VOCS object to define variables and
+objectives, and a generator object from Xopt or Optimas. Use VOCS unless the user
+explicitly asks for the classic style or the generator only exists in classic form
+(e.g., APOSMM, persistent_sampling).
+
+## Defaults
+
+- nworkers defaults to 4 unless the user specifies otherwise (or 1 for sequential
+  generators like Nelder-Mead)
+- All nworkers are available for simulations
+- No alloc_specs needed — all allocator options are available as GenSpecs parameters
+- Use `async_return=True` in GenSpecs unless there is a reason to use batch returns
+
+## VOCS generators (Xopt / Optimas)
+
+Key patterns:
+- Variables named individually in VOCS: `{"x0": [lb, ub], "x1": [lb, ub]}`
+- Objectives named in VOCS: `{"f": "MINIMIZE"}`
+- GenSpecs uses `generator=`, `vocs=`, `batch_size=`
+- SimSpecs uses `vocs=` or `simulator=` for gest-api style sim functions
+- No `add_random_streams()` needed
+- Xopt generators need `initial_sample_method="uniform"` and `initial_batch_size=`
+  for initial evaluated data. Optimas handles its own sampling.
+
+See `references/generators.md` for the full generator selection guide.
+
+## Classic generators
+
+Used only when the generator has no VOCS version or the user explicitly requests it.
+- One worker is consumed by the persistent generator
+- Requires `add_random_streams()`
+- APOSMM: see `references/aposmm.md` for full configuration details
+
+## Sim function patterns
+
+**Inline sim function** (no external app): Takes `(H, persis_info, sim_specs, libE_info)`
+and returns `(H_o, persis_info)`. Or for VOCS gest-api style, takes `input_dict: dict`
+and returns a dict. See `libensemble/sim_funcs/` for built-in examples.
+
+**Executor-based sim function** (external app): Uses MPIExecutor to run an application.
+Pattern:
+1. Register app in calling script: `exctr.register_app(full_path=..., app_name=...)`
+2. In sim function: get executor from `libE_info["executor"]`, submit with
+   `exctr.submit(app_name=...)`, wait with `task.wait()`
+3. Read output file to get objective value
+4. Set `sim_dirs_make=True` in LibeSpecs
+5. If using input file templating, set `sim_dir_copy_files=[input_file]`
+
+## Results interpretation
+
+After a successful run:
+- Load the .npy output file with `np.load()`
+- Always filter by `sim_ended == True` before analyzing — rows where sim_ended is False
+  contain uninitialized values (often zeros) that are NOT real results
+- For APOSMM: check rows where `local_min == True` to find identified minima
+- Report the count, location, and objective value of minima or best points found
+- If the best objective value is exactly 0.0, verify those rows have sim_ended == True
+- See `references/results_metadata.md` for full details
+
+## Reference docs (read as needed)
+
+All paths relative to this skill's directory:
+
+- `references/generators.md` — Generator selection guide, VOCS vs classic
+- `references/aposmm.md` — APOSMM configuration, optimizer options, tuning
+- `references/finding_objectives.md` — Identifying objective fields in results
+- `references/results_metadata.md` — Interpreting history array, filtering results
+
+## User request
+
+$ARGUMENTS
diff --git a/.claude/skills/generate-scripts/references/aposmm.md b/.claude/skills/generate-scripts/references/aposmm.md
new file mode 100644
index 000000000..892007b87
--- /dev/null
+++ b/.claude/skills/generate-scripts/references/aposmm.md
@@ -0,0 +1,146 @@
+# APOSMM — Asynchronously Parallel Optimization Solver for Multiple Minima
+
+APOSMM coordinates concurrent local optimization runs to find multiple local minima on parallel hardware. Use when the user wants to find minima, optimize, or explore an optimization landscape.
+
+Module: `persistent_aposmm`
+Function: `aposmm`
+Allocator: `persistent_aposmm_alloc` (NOT the default `start_only_persistent`)
+Requirements: mpmath, SciPy (plus optional packages for specific local optimizers)
+
+## APOSMM gen_specs in generated scripts
+
+When the MCP tool generates APOSMM scripts, run_libe.py gets this gen_specs structure:
+
+```python
+gen_specs = GenSpecs(
+    gen_f=gen_f,
+    inputs=[],
+    persis_in=["sim_id", "x", "x_on_cube", "f"],
+    outputs=[("x", float, n), ("x_on_cube", float, n), ("sim_id", int),
+             ("local_min", bool), ("local_pt", bool)],
+    user={
+        "initial_sample_size": num_workers,
+        "localopt_method": "scipy_Nelder-Mead",
+        "opt_return_codes": [0],
+        "nu": 1e-8,
+        "mu": 1e-8,
+        "dist_to_bound_multiple": 0.01,
+        "max_active_runs": 6,
+        "lb": np.array([...]),  # MUST match user's requested bounds
+        "ub": np.array([...]),  # MUST match user's requested bounds
+    }
+)
+```
+
+With allocator:
+```python
+from libensemble.alloc_funcs.persistent_aposmm_alloc import persistent_aposmm_alloc as alloc_f
+```
+
+## Required gen_specs["user"] Parameters
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `lb` | n floats | Lower bounds on search domain |
+| `ub` | n floats | Upper bounds on search domain |
+| `localopt_method` | str | Local optimizer (see table below) |
+| `initial_sample_size` | int | Uniform samples before starting local runs |
+
+When using a SciPy method, must also supply `opt_return_codes` — e.g. [0] for Nelder-Mead/BFGS, [1] for COBYLA.
+
+## Optional gen_specs["user"] Parameters
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `max_active_runs` | int | Max concurrent local optimization runs. Must not exceed nworkers. |
+| `dist_to_bound_multiple` | float (0,1] | Fraction of distance to boundary for initial step size |
+| `mu` | float | Min distance from boundary for starting points |
+| `nu` | float | Min distance from identified minima for starting points |
+| `stop_after_k_minima` | int | Stop after this many local minima found |
+| `stop_after_k_runs` | int | Stop after this many runs ended |
+| `sample_points` | numpy array | Specific points to sample (original domain) |
+| `lhs_divisions` | int | Latin hypercube partitions (0 or 1 = uniform) |
+| `rk_const` | float | Multiplier for r_k value |
+
+## Worker Configuration
+
+With `gen_on_manager=True`, the persistent generator runs on the manager process and all `nworkers` are available for simulations.
+
+## Local Optimizer Methods
+
+### SciPy (no extra install)
+
+| Method | Gradient? | `opt_return_codes` |
+|--------|-----------|-------------------|
+| `scipy_Nelder-Mead` | No | [0] |
+| `scipy_COBYLA` | No | [1] |
+| `scipy_BFGS` | Yes | [0] |
+
+### NLopt (requires nlopt package)
+
+| Method | Gradient? | Description |
+|--------|-----------|-------------|
+| `LN_SBPLX` | No | Subplex. Good for noisy/nonsmooth |
+| `LN_BOBYQA` | No | Quadratic model. Good for smooth problems |
+| `LN_COBYLA` | No | Constrained optimization |
+| `LN_NEWUOA` | No | Unconstrained quadratic model |
+| `LN_NELDERMEAD` | No | Classic simplex |
+| `LD_MMA` | Yes | Method of Moving Asymptotes |
+
+NLopt methods require convergence tolerances. If the user does not specify tolerances, use these defaults:
+
+```python
+"xtol_abs": 1e-6,
+"ftol_abs": 1e-6,
+```
+
+When using an NLopt method, always include `rk_const` scaled to the problem dimension:
+
+```python
+from math import gamma, pi, sqrt
+n = <number of dimensions>
+rk_const = 0.5 * ((gamma(1 + (n / 2)) * 5) ** (1 / n)) / sqrt(pi)
+```
+
+Use this formula directly in the generated script — do not precompute the value.
+
+### PETSc/TAO (requires petsc4py package)
+
+| Method | Needs | Description |
+|--------|-------|-------------|
+| `pounders` | fvec | Least-squares trust-region |
+| `blmvm` | grad | Bounded limited-memory variable metric |
+| `nm` | f only | Nelder-Mead variant |
+
+### DFO-LS (requires dfols package)
+
+| Method | Needs | Description |
+|--------|-------|-------------|
+| `dfols` | fvec | Derivative-free least-squares |
+
+## Choosing a Local Optimizer
+
+- **Default / simple**: `scipy_Nelder-Mead` — no extra packages
+- **Smooth, bounded**: `LN_BOBYQA` (NLopt)
+- **Noisy objectives**: `LN_SBPLX` (NLopt) or `scipy_Nelder-Mead`
+- **Gradient available**: `scipy_BFGS` or `LD_MMA`
+- **Least-squares (vector output)**: `pounders` (PETSc) or `dfols`
+- **Constrained**: `scipy_COBYLA` or `LN_COBYLA`
+
+## Interpreting Results
+
+After a run, report the number of minima found. Load the results `.npy` file,
+filter by `sim_ended == True`, then check `local_min == True` rows.
+Report the count, objective value, and location of each minimum.
+
+## Tuning
+
+If APOSMM is not finding minima, try increasing the multiplier in `rk_const` (e.g., from 0.5 to a larger value) to make it more aggressive about starting new local optimization runs in different regions.
+
+Use this formula directly in the generated script — do not precompute the value.
+Also consider increasing `dist_to_bound_multiple` (e.g., 0.5) for a larger initial
+step size.
+
+## Important
+
+Always use the bounds, sim_max, and paths from the user's request. Never substitute values from examples or known problem domains.
diff --git a/.claude/skills/generate-scripts/references/finding_objectives.md b/.claude/skills/generate-scripts/references/finding_objectives.md
new file mode 100644
index 000000000..e8f8e4bde
--- /dev/null
+++ b/.claude/skills/generate-scripts/references/finding_objectives.md
@@ -0,0 +1,31 @@
+# Finding Objective Fields
+How to find objective field names in results files.
+
+## VOCS scripts
+
+The objective field name is defined in the VOCS object:
+```python
+vocs = VOCS(
+    variables={"x0": [-2, 2], "x1": [-1, 1]},
+    objectives={"f": "MINIMIZE"},
+)
+```
+
+The key in `objectives` (e.g. `"f"`) is the objective field name in the results.
+
+## Classic scripts
+
+The objective field name is defined in `sim_specs` outputs:
+```python
+sim_specs = SimSpecs(
+    ...
+    outputs=[("f", float)],  # "f" is the objective field name
+)
+```
+
+The field name in `outputs` (e.g. `"f"`) matches the field name in the `.npy` results file.
+
+## Common patterns
+- Single objective: `{"f": "MINIMIZE"}` (VOCS) or `outputs=[("f", float)]` (classic)
+- Multiple outputs: `"f"` is typically the objective — the scalar float used by the generator
+- The objective field name in the VOCS definition or sim_specs outputs matches the field in the results
diff --git a/.claude/skills/generate-scripts/references/generators.md b/.claude/skills/generate-scripts/references/generators.md
new file mode 100644
index 000000000..e6f6b3ccc
--- /dev/null
+++ b/.claude/skills/generate-scripts/references/generators.md
@@ -0,0 +1,81 @@
+# libEnsemble Generator Functions
+
+This guide is for choosing a generator when one is not already provided. If the user
+is converting an existing workflow that already has a generator, use that generator
+as-is — do not use this guide to replace it.
+
+libEnsemble supports two styles of generator configuration:
+
+- **VOCS generators (gest-api)** — The default style. Uses a VOCS object to define variables, objectives, and constraints. The generator is passed as an object from Xopt, Optimas, or another gest-api compatible library.
+- **Classic generators** — libEnsemble-native gen functions configured via `gen_f`, explicit `inputs`/`outputs`, and `user` dicts with bounds/parameters. Used only when the generator has no VOCS version or the user explicitly requests it.
+
+## When to Choose a Generator Style
+
+**VOCS is the default style.** Any generator from Xopt or Optimas is always VOCS — these libraries provide many generators covering optimization, sampling, surrogate modeling, and more. Do not switch an Xopt or Optimas generator to a classic libEnsemble generator.
+
+Use **classic generators** only when:
+- The user explicitly asks for the classic/traditional style
+- The generator does not have a VOCS version (APOSMM, persistent_sampling)
+
+## Choosing a generator
+
+| Goal | Suggested generator | Style | Package |
+|------|---------------------|-------|---------|
+| Bayesian optimization | Xopt (e.g., Expected Improvement) | VOCS | `xopt` |
+| Sampling / exploration | Xopt (e.g., Latin Hypercube) | VOCS | `xopt` |
+| Ax-based optimization, multi-fidelity, multi-task | Optimas | VOCS | `optimas` |
+| Simplex optimization | Xopt Nelder-Mead | VOCS | `xopt` |
+| Multi-objective Bayesian | Xopt MOBO | VOCS | `xopt` |
+| GP-based adaptive sampling | gpCAM | VOCS or Classic | `gen_classes/gpCAM` |
+| Find multiple local minima | APOSMM | VOCS or Classic | `gen_classes/aposmm` |
+| Random/uniform sampling | Sampling | VOCS or Classic | `gen_classes/sampling` |
+
+Xopt and Optimas each provide many generators beyond those listed here. If the
+generator choice is not clear, check the library documentation:
+- Xopt: https://github.com/xopt-org/Xopt — algorithms at https://xopt.xopt.org/algorithms/
+- Optimas: https://github.com/optimas-org/optimas
+
+If the user says "optimize" without specifics -> Xopt (VOCS).
+If the user says "Xopt", "VOCS", "Optimas", or names a specific generator from those libraries -> VOCS style.
+If the user says "Ax", "multi-fidelity", "multi-task" -> Optimas (VOCS).
+If the user says "find minima", "multiple local minima" -> APOSMM (classic).
+If the user says "sample", "explore", "sweep" -> Xopt or Optimas can do this (VOCS), or persistent sampling (classic).
+
+## VOCS Generators (gest-api)
+
+VOCS is the default configuration style for generators in libEnsemble. Configuration uses a VOCS object to define the optimization problem and a generator object. Generators may come from Xopt, Optimas, libEnsemble, or other gest-api compatible libraries.
+
+### Key patterns
+
+- Variables are named individually in VOCS (`{"x0": [lb, ub], "x1": [lb, ub]}`)
+- Objectives are named in VOCS (`{"f": "MINIMIZE"}`)
+- GenSpecs uses `generator=`, `vocs=`, and `batch_size=`
+- SimSpecs uses `vocs=` or `simulator=` for gest-api style sim functions
+- No alloc_specs needed (default is correct)
+- No `add_random_streams()` needed
+- Use `async_return=True` in GenSpecs unless the generator requires batch returns
+
+### Initial sampling
+
+Some generators require evaluated data before they can suggest points. Set `initial_sample_method` in GenSpecs to have libEnsemble produce and evaluate an initial sample before starting the generator:
+
+- `initial_sample_method="uniform"` — uniform random sample from VOCS bounds
+- `initial_batch_size` — required, specifies how many sample points to produce
+
+Generators that handle their own sampling do not need this.
+
+### Sim function adaptation
+
+When using VOCS generators with an executor-based sim function, the sim must read individual variable names from H rather than unpacking `H["x"]`. The `input_names` in `sim_specs["user"]` should match the VOCS variable names directly.
+
+## Classic Generators
+
+### persistent_sampling (persistent_uniform)
+Random uniform sampling across parameter space. After the initial batch, creates p new random points for every p points returned.
+
+gen_specs["user"]: `lb`, `ub`, `initial_batch_size`
+gen_specs outputs: `x (float, n)`
+
+### APOSMM (persistent_aposmm)
+See `reference_docs/aposmm.md` for full details.
+Asynchronously Parallel Optimization Solver for finding Multiple Minima.
diff --git a/.claude/skills/generate-scripts/references/results_metadata.md b/.claude/skills/generate-scripts/references/results_metadata.md
new file mode 100644
index 000000000..97a71c14b
--- /dev/null
+++ b/.claude/skills/generate-scripts/references/results_metadata.md
@@ -0,0 +1,42 @@
+# Results Metadata
+How to interpret libEnsemble history array fields and filter for completed simulations.
+
+## History array (H)
+
+The `.npy` output file contains the history array H with both user-defined fields and
+metadata fields added by libEnsemble.
+
+## Key metadata fields
+
+- `sim_ended`: True if the simulation completed. Only rows with `sim_ended == True` have valid results.
+- `sim_started`: True if the simulation was dispatched to a worker.
+- `returned`: True if results were returned to the manager.
+- `sim_id`: Unique simulation ID (0-indexed).
+- `gen_informed`: True if the generator has been informed of this result.
+
+## Filtering for valid results
+
+When analyzing results (e.g., finding the minimum objective value), always filter for
+completed simulations:
+
+```python
+H = np.load("results.npy")
+done = H[H["sim_ended"]]  # Only completed simulations
+```
+
+Rows where `sim_ended` is False may have default/zero values that are not real results.
+This is common for the last few rows when the simulation budget is exhausted — they were
+allocated by the generator but never evaluated.
+
+## Reporting results
+
+After a successful run, report any minima found in the results. See the generator-specific
+guide for which fields indicate identified minima.
+
+## Common pitfall
+
+If the minimum objective value is exactly 0.0, check whether those rows have
+`sim_ended == True`. Unevaluated rows often have fields initialized to zero.
+This is common for the last few rows when the simulation budget is exhausted — they were
+allocated by the generator but never evaluated.
+