← Previous: Numerical Differentiation | Back to Index | Next: Root Finding →
Optimization is the process of finding the parameter set that minimizes (or maximizes) an objective function. The Numerics library provides a comprehensive suite of local and global optimization algorithms for both unconstrained and constrained problems. These methods are essential for parameter estimation, model calibration, machine learning, and engineering design optimization.
| Method | Type | Best For | Requires Derivatives |
|---|---|---|---|
| Local Methods | |||
| BFGS | Local | Smooth, differentiable functions | No (numerical) |
| Nelder-Mead | Local | General purpose, robust | No |
| Powell | Local | Smooth functions without derivatives | No |
| Golden Section | Local | 1D problems | No |
| Brent Search | Local | 1D problems, smooth functions | No |
| Gradient Descent | Local | Large-scale, smooth problems | No (numerical) |
| ADAM | Local | Machine learning, stochastic | No (numerical) |
| Global Methods | |||
| Differential Evolution | Global | Multimodal, robust | No |
| Particle Swarm | Global | Multimodal, fast convergence | No |
| Shuffled Complex Evolution | Global | Hydrological calibration | No |
| Simulated Annealing | Global | Discrete/continuous, multimodal | No |
| Multi-Start | Global | Combines local search with random starts | No |
| MLSL | Global | Smooth multimodal functions | Requires local solver |
| Constrained | |||
| Augmented Lagrangian | Constrained | Equality and inequality constraints | No |
An optimization problem seeks to find:
subject to:
where
For maximization problems, minimize Maximize() method.
All optimizers in Numerics inherit from the Optimizer base class and share a common interface.
MaxIterations: Maximum iterations allowed (default = 10,000)MaxFunctionEvaluations: Maximum function evaluations (default = int.MaxValue)AbsoluteTolerance: Absolute convergence tolerance (default = 1E-8)RelativeTolerance: Relative convergence tolerance (default = 1E-8)ReportFailure: Throw exception on failure (default = true)RecordTraces: Save optimization trace (default = true)ComputeHessian: Compute Hessian at solution (default = true)
BestParameterSet: Optimal solution (Values, Fitness)Iterations: Number of iterations performedFunctionEvaluations: Number of function evaluationsStatus: Optimization status (Success, Failure, etc.)ParameterSetTrace: Full trace of parameter evaluationsHessian: Numerically differentiated Hessian matrix (if computed)
-
Minimize(): Minimize the objective function -
Maximize(): Maximize the objective function (minimizes $-f(\mathbf{x})$)
Local optimization methods find the nearest local minimum from a starting point. They are fast and efficient for smooth, unimodal functions but may get trapped in local minima for multimodal problems.
BFGS is a quasi-Newton method that builds an approximation to the Hessian matrix using gradient information [1]. It's one of the most effective methods for smooth, unconstrained optimization.
using Numerics.Mathematics.Optimization;
// Rosenbrock function: f(x,y) = (1-x)² + 100(y-x²)²
double Rosenbrock(double[] x)
{
return Math.Pow(1 - x[0], 2) + 100 * Math.Pow(x[1] - x[0] * x[0], 2);
}
// Initial guess
var initial = new double[] { -1.2, 1.0 };
var lower = new double[] { -2, -2 };
var upper = new double[] { 2, 2 };
// Create and configure optimizer
var bfgs = new BFGS(Rosenbrock, 2, initial, lower, upper);
bfgs.RelativeTolerance = 1e-8;
bfgs.MaxIterations = 1000;
// Minimize
bfgs.Minimize();
// Results
Console.WriteLine($"Optimal solution: x = {bfgs.BestParameterSet.Values[0]:F6}, y = {bfgs.BestParameterSet.Values[1]:F6}");
Console.WriteLine($"Minimum value: {bfgs.BestParameterSet.Fitness:F10}");
Console.WriteLine($"Iterations: {bfgs.Iterations}");
Console.WriteLine($"Function evaluations: {bfgs.FunctionEvaluations}");
Console.WriteLine($"Status: {bfgs.Status}");Advantages: Fast convergence, memory efficient, works well for most smooth problems.
Disadvantages: Can fail on non-smooth functions, requires bounded search space.
The Nelder-Mead method is a direct search algorithm that uses a simplex (geometric figure with n+1 vertices in n dimensions) to search the parameter space [2]. It's robust and doesn't require derivatives.
var nm = new NelderMead(Rosenbrock, 2, initial, lower, upper);
nm.RelativeTolerance = 1e-6;
nm.Minimize();
Console.WriteLine($"Optimal solution: [{nm.BestParameterSet.Values[0]:F6}, {nm.BestParameterSet.Values[1]:F6}]");
Console.WriteLine($"Minimum value: {nm.BestParameterSet.Fitness:F10}");Advantages: Very robust, doesn't require derivatives, handles non-smooth functions.
Disadvantages: Slower convergence than gradient-based methods, can stagnate.
Powell's method is a conjugate direction algorithm that doesn't require derivatives [1]. It performs successive line searches along conjugate directions.
var powell = new Powell(Rosenbrock, 2, initial, lower, upper);
powell.RelativeTolerance = 1e-8;
powell.Minimize();
Console.WriteLine($"Solution: [{powell.BestParameterSet.Values[0]:F6}, {powell.BestParameterSet.Values[1]:F6}]");Advantages: No derivatives required, good for smooth functions.
Disadvantages: Can be slow in high dimensions, sensitive to scaling.
Simple gradient-based optimization with line search:
var gd = new GradientDescent(Rosenbrock, 2, initial, lower, upper);
gd.Alpha = 0.001; // Step size (learning rate)
gd.Minimize();Adaptive Moment Estimation, popular in machine learning applications [3]:
var adam = new ADAM(Rosenbrock, 2, initial, lower, upper);
adam.Alpha = 0.001;
adam.Beta1 = 0.9; // First moment decay
adam.Beta2 = 0.999; // Second moment decay
adam.Minimize();Advantages: Adaptive learning rates, works well for noisy objectives.
Best for: Machine learning, stochastic optimization problems.
For single-parameter optimization, specialized 1D methods are more efficient:
Uses the golden ratio to bracket the minimum:
Func<double, double> f1d = x => Math.Pow(x - 2, 2) + 3;
var golden = new GoldenSection(f1d, 0.0, 5.0);
golden.Minimize();
Console.WriteLine($"Minimum at x = {golden.BestParameterSet.Values[0]:F6}");Combines golden section search with parabolic interpolation for faster convergence:
var brent = new BrentSearch(f1d, 0.0, 5.0);
brent.Minimize();Global optimization methods are designed to find the global minimum across the entire search space, avoiding local minima. They typically require more function evaluations but are more robust for multimodal problems.
Differential Evolution (DE) is a population-based evolutionary algorithm that's very robust for continuous optimization [4]. For each member
Mutation (DE/rand/1): Three distinct population members $\mathbf{x}{r_0}$, $\mathbf{x}{r_1}$,
where
Crossover (binomial): The trial vector
where
Selection (greedy): The trial vector replaces the target only if it has equal or better fitness:
The algorithm converges when the standard deviation of fitness values across the population falls below the tolerance.
using Numerics.Mathematics.Optimization;
// Rastrigin function (highly multimodal)
double Rastrigin(double[] x)
{
double sum = 10 * x.Length;
for (int i = 0; i < x.Length; i++)
{
sum += x[i] * x[i] - 10 * Math.Cos(2 * Math.PI * x[i]);
}
return sum;
}
var lower = new double[] { -5.12, -5.12 };
var upper = new double[] { 5.12, 5.12 };
var de = new DifferentialEvolution(Rastrigin, 2, lower, upper);
de.PopulationSize = 50;
de.CrossoverProbability = 0.9;
de.Mutation = 0.75;
de.MaxIterations = 1000;
de.Minimize();
Console.WriteLine($"Global minimum: [{de.BestParameterSet.Values[0]:F6}, {de.BestParameterSet.Values[1]:F6}]");
Console.WriteLine($"Function value: {de.BestParameterSet.Fitness:F10}");Advantages: Very robust, handles discontinuities, good for difficult problems.
Parameters:
PopulationSize: Number of candidate solutions (default = 10 × dimensions)CrossoverProbability: Probability of crossover (default = 0.9)Mutation: Scaling factor for mutation (default = 0.75)
PSO simulates social behavior of bird flocking or fish schooling [5]. Each particle
where:
-
$w$ is the inertia weight, which decreases linearly from$w_{\max} = 0.9$ to$w_{\min} = 0.4$ over the optimization, balancing exploration (high$w$ ) and exploitation (low$w$ ) -
$c_1 = c_2 = 2.05$ are the cognitive and social acceleration coefficients -
$r_1, r_2 \sim U(0,1)$ are independent random numbers (per component, per particle) -
$\mathbf{p}_i$ is particle$i$ 's personal best position (best position it has visited) -
$\mathbf{g}$ is the global best position (best position any particle has visited)
The three terms represent momentum (continue in the current direction), cognitive pull (return toward personal best), and social pull (move toward the swarm's best).
var pso = new ParticleSwarm(Rastrigin, 2, lower, upper);
pso.PopulationSize = 40;
pso.Minimize();
Console.WriteLine($"Solution: [{pso.BestParameterSet.Values[0]:F6}, {pso.BestParameterSet.Values[1]:F6}]");Advantages: Fast convergence, simple concept, works well for continuous problems.
Parameters:
PopulationSize: Number of particles (default = 30)
SCE-UA was specifically developed for calibrating hydrological models [6]. The algorithm partitions the population into
The CCE step within each complex proceeds as follows:
-
Sub-complex selection: Select
$q = D+1$ points from the complex using a trapezoidal probability distribution that favors better-ranked points:$P(i) = \frac{2(N+1-i)}{N(N+1)}$ -
Reflection: Compute the centroid
$\mathbf{g}$ of all sub-complex points except the worst point$\mathbf{x}_w$ , then reflect:
-
Contraction: If the reflected point is infeasible or worse than
$\mathbf{x}_w$ , try contraction:
- Mutation: If contraction also fails, generate a random point within the smallest bounding box of the complex.
The shuffling step redistributes points across complexes, preventing any single complex from stagnating. This combination of local competitive evolution within complexes and global information sharing between complexes makes SCE-UA particularly effective for the rugged, multimodal objective functions typical of hydrological model calibration.
var sce = new ShuffledComplexEvolution(Rastrigin, 2, lower, upper);
sce.Complexes = 5;
sce.MaxIterations = 1000;
sce.Minimize();
Console.WriteLine($"Calibrated parameters: [{sce.BestParameterSet.Values[0]:F6}, {sce.BestParameterSet.Values[1]:F6}]");Advantages: Excellent for hydrological model calibration, balances exploration and exploitation.
Best for: Calibrating watershed models, water resources applications.
SA mimics the physical process of annealing in metallurgy [7]. It accepts uphill moves with decreasing probability, allowing escape from local minima.
var sa = new SimulatedAnnealing(Rastrigin, 2, lower, upper);
sa.InitialTemperature = 10.0;
sa.CoolingRate = 0.95;
sa.MaxIterations = 10000;
sa.Minimize();
Console.WriteLine($"Solution: [{sa.BestParameterSet.Values[0]:F6}, {sa.BestParameterSet.Values[1]:F6}]");Advantages: Can escape local minima, works for discrete and continuous problems.
Parameters:
InitialTemperature: Starting temperature (default = 10)CoolingRate: Temperature reduction factor (default = 0.95)
Combines local search with multiple random starting points:
var initial = new double[] { 0.0, 0.0 };
var ms = new MultiStart(Rastrigin, 2, initial, lower, upper, LocalMethod.BFGS);
ms.MaxIterations = 20; // Number of random starts
ms.Minimize();
Console.WriteLine($"Best solution: [{ms.BestParameterSet.Values[0]:F6}, {ms.BestParameterSet.Values[1]:F6}]");Advantages: Simple to implement, leverages fast local search.
Disadvantages: May waste evaluations in same basin of attraction.
Clustering-based global optimization that avoids redundant local searches:
var initial = new double[] { 0.0, 0.0 };
var mlsl = new MLSL(Rastrigin, 2, initial, lower, upper, LocalMethod.BFGS);
mlsl.Minimize();Advantages: More efficient than multi-start, avoids redundant searches.
The Augmented Lagrangian method handles equality and inequality constraints by adding penalty terms to the objective function [8].
using Numerics.Mathematics.Optimization;
// Objective: minimize x² + y²
double Objective(double[] x)
{
return x[0] * x[0] + x[1] * x[1];
}
// Constraint: x + y >= 1
var constraint = new Constraint(
x => x[0] + x[1], 2, 1.0, // g(x) >= value form
ConstraintType.GreaterThanOrEqualTo
);
var lower = new double[] { -5, -5 };
var upper = new double[] { 5, 5 };
var initial = new double[] { 0, 0 };
// Create the inner optimizer
var bfgs = new BFGS(Objective, 2, initial, lower, upper);
// Create the Augmented Lagrange optimizer with constraints
var al = new AugmentedLagrange(Objective, bfgs, new IConstraint[] { constraint });
al.MaxIterations = 100;
al.Minimize();
Console.WriteLine($"Optimal solution: [{al.BestParameterSet.Values[0]:F6}, {al.BestParameterSet.Values[1]:F6}]");
Console.WriteLine($"Constraint satisfied: {al.BestParameterSet.Values[0] + al.BestParameterSet.Values[1]:F6} >= 1");Constraint Types:
-
ConstraintType.EqualTo: Equality constraint$g(\mathbf{x}) = 0$ -
ConstraintType.LesserThanOrEqualTo: Inequality constraint$g(\mathbf{x}) \leq 0$ -
ConstraintType.GreaterThanOrEqualTo: Inequality constraint$g(\mathbf{x}) \geq 0$
Example: Minimize subject to multiple constraints:
// Minimize f(x,y) = (x-3)² + (y-2)²
// Subject to: x + y <= 5
// x >= 1
// y >= 1
double ObjectiveFunc(double[] x)
{
return Math.Pow(x[0] - 3, 2) + Math.Pow(x[1] - 2, 2);
}
var c1 = new Constraint(x => x[0] + x[1], 2, 5.0, ConstraintType.LesserThanOrEqualTo);
var c2 = new Constraint(x => x[0], 2, 1.0, ConstraintType.GreaterThanOrEqualTo);
var c3 = new Constraint(x => x[1], 2, 1.0, ConstraintType.GreaterThanOrEqualTo);
var innerOptimizer = new BFGS(ObjectiveFunc, 2, new[] { 2.0, 2.0 },
new[] { 0.0, 0.0 }, new[] { 10.0, 10.0 });
var constrained = new AugmentedLagrange(ObjectiveFunc, innerOptimizer,
new IConstraint[] { c1, c2, c3 });
constrained.Minimize();
Console.WriteLine($"Constrained optimum: [{constrained.BestParameterSet.Values[0]:F4}, " +
$"{constrained.BestParameterSet.Values[1]:F4}]");A complete example of using optimization to calibrate a watershed model:
using Numerics.Mathematics.Optimization;
using Numerics.Data.Statistics;
// Observed streamflow data
double[] observed = { 12.5, 15.3, 18.7, 16.2, 14.1, 11.8, 10.3 };
// Simple runoff model: Q = C × P^α where Q is flow, P is precipitation, C and α are parameters
double[] precipitation = { 10, 12, 15, 13, 11, 9, 8 };
// Objective function: minimize RMSE
double ObjectiveFunction(double[] parameters)
{
double C = parameters[0];
double alpha = parameters[1];
// Simulate streamflow
double[] simulated = new double[observed.Length];
for (int i = 0; i < observed.Length; i++)
{
simulated[i] = C * Math.Pow(precipitation[i], alpha);
}
// Compute RMSE
double rmse = GoodnessOfFit.RMSE(observed, simulated);
return rmse;
}
// Parameter bounds
var lower = new double[] { 0.1, 0.5 }; // C >= 0.1, α >= 0.5
var upper = new double[] { 5.0, 3.0 }; // C <= 5.0, α <= 3.0
// Use SCE-UA (recommended for hydrological calibration)
var optimizer = new ShuffledComplexEvolution(ObjectiveFunction, 2, lower, upper);
optimizer.Complexes = 5;
optimizer.MaxIterations = 1000;
optimizer.Minimize();
Console.WriteLine($"Calibrated parameters:");
Console.WriteLine($" C = {optimizer.BestParameterSet.Values[0]:F4}");
Console.WriteLine($" α = {optimizer.BestParameterSet.Values[1]:F4}");
Console.WriteLine($" RMSE = {optimizer.BestParameterSet.Fitness:F4}");
Console.WriteLine($" Function evaluations: {optimizer.FunctionEvaluations}");
// Verify calibration
double[] final_simulated = new double[observed.Length];
for (int i = 0; i < observed.Length; i++)
{
final_simulated[i] = optimizer.BestParameterSet.Values[0] *
Math.Pow(precipitation[i], optimizer.BestParameterSet.Values[1]);
}
double nse = GoodnessOfFit.NashSutcliffeEfficiency(observed, final_simulated);
Console.WriteLine($" NSE = {nse:F4}");| Problem Type | Recommended Method | Notes |
|---|---|---|
| Smooth, unimodal | BFGS or Powell | Fast convergence to local minimum |
| Non-smooth, unimodal | Nelder-Mead | Robust direct search |
| Multimodal, global search | Differential Evolution or SCE-UA | Thorough exploration |
| Quick global search | Particle Swarm | Faster but less thorough |
| Hydrological calibration | SCE-UA | Specifically designed for this |
| With constraints | Augmented Lagrangian | Handles equality and inequality constraints |
| 1D problem | Brent Search or Golden Section | Specialized efficient methods |
| Machine learning | ADAM | Adaptive learning rates |
| Unknown smoothness | Start with Nelder-Mead | Very robust |
| High dimensions (>20) | Differential Evolution or ADAM | Scale better than others |
-
Scale Variables: Normalize parameters to similar ranges (e.g., [0,1] or [-1,1]) for better convergence.
-
Set Reasonable Bounds: Tight bounds improve convergence but shouldn't exclude the optimum.
-
Multiple Runs: For global optimization, run multiple times with different random seeds to ensure robustness.
-
Hybrid Approach: Use global method followed by local refinement:
// Global search var de = new DifferentialEvolution(ObjectiveFunc, n, lower, upper); de.Minimize(); // Local refinement var bfgs = new BFGS(ObjectiveFunc, n, de.BestParameterSet.Values, lower, upper); bfgs.Minimize();
-
Monitor Convergence: Check the
ParameterSetTraceto diagnose convergence issues:foreach (var ps in optimizer.ParameterSetTrace) { Console.WriteLine($"Iteration {ps.Fitness}"); }
-
Adjust Tolerances: Tighter tolerances require more evaluations:
optimizer.RelativeTolerance = 1e-10; // Very tight optimizer.AbsoluteTolerance = 1e-10;
-
Population Size: For global methods, larger populations explore better but cost more:
de.PopulationSize = 20 * NumberOfParameters; // Rule of thumb
The BestParameterSet contains:
Values: The optimal parameter vectorFitness: The objective function value at the optimumWeight: (Optional) Used internally by some algorithms
When ComputeHessian = true, the Hessian at the solution is computed numerically. The Hessian
- All eigenvalues positive: The solution is a local minimum (the Hessian is positive definite)
- Large eigenvalue: The objective is highly curved in that direction — the corresponding parameter is well-determined
- Small eigenvalue: The objective is nearly flat — the parameter is poorly determined or identifiable
- Near-zero eigenvalue: Indicates a ridge or valley in the objective surface, suggesting parameter correlation or redundancy
The inverse of the Hessian approximates the covariance matrix of the parameters, so parameter standard errors can be estimated as $SE(\hat{\theta}i) \approx \sqrt{|H{ii}^{-1}|}$:
optimizer.ComputeHessian = true;
optimizer.Minimize();
// Access Hessian
var H = optimizer.Hessian;
// Parameter standard errors (approximation)
for (int i = 0; i < optimizer.NumberOfParameters; i++)
{
double se = Math.Sqrt(Math.Abs(1.0 / H[i, i]));
Console.WriteLine($"Parameter {i} ± {se:F4}");
}Check the Status property to verify success:
if (optimizer.Status == OptimizationStatus.Success)
{
Console.WriteLine("Optimization converged successfully");
}
else if (optimizer.Status == OptimizationStatus.MaximumIterationsReached)
{
Console.WriteLine("Maximum iterations reached - may not have converged");
}-
Function Evaluation Cost: Most optimization time is spent evaluating the objective function. Optimize your function first.
-
Parallel Evaluation: For population-based methods (DE, PSO), evaluate population members in parallel if your function is thread-safe.
-
Warm Start: If solving similar problems repeatedly, use the previous solution as initial guess.
-
Gradient Information: If you can provide analytical gradients, BFGS will converge much faster (though the current implementation uses numerical gradients).
[1] Nocedal, J., & Wright, S. J. (2006). Numerical Optimization (2nd ed.). Springer.
[2] Nelder, J. A., & Mead, R. (1965). A simplex method for function minimization. The Computer Journal, 7(4), 308-313.
[3] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
[4] Storn, R., & Price, K. (1997). Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11(4), 341-359.
[5] Kennedy, J., & Eberhart, R. (1995). Particle swarm optimization. Proceedings of IEEE International Conference on Neural Networks, 4, 1942-1948.
[6] Duan, Q., Sorooshian, S., & Gupta, V. (1992). Effective and efficient global optimization for conceptual rainfall-runoff models. Water Resources Research, 28(4), 1015-1031.
[7] Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization by simulated annealing. Science, 220(4598), 671-680.
[8] Birgin, E. G., & Martínez, J. M. (2014). Practical Augmented Lagrangian Methods for Constrained Optimization. SIAM.
← Previous: Numerical Differentiation | Back to Index | Next: Root Finding →