AI-Enabled-Software-Testing
diff --git a/‎README.md‎
Lines changed: 3 additions & 0 deletions b/‎README.md‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎report/figures/convergence.png‎
100 KB b/‎report/figures/convergence.png‎
100 KB
diff --git a/‎report/figures/evaluations.png‎
328 KB b/‎report/figures/evaluations.png‎
328 KB
diff --git a/‎report/figures/stability.png‎
43.5 KB b/‎report/figures/stability.png‎
43.5 KB
diff --git a/‎report/figures/test_performance.png‎
26.3 KB b/‎report/figures/test_performance.png‎
26.3 KB
diff --git a/‎report/main.tex‎
Lines changed: 6 additions & 6 deletions b/‎report/main.tex‎
Lines changed: 6 additions & 6 deletions
diff --git a/‎report/refs.bib‎
Lines changed: 58 additions & 0 deletions b/‎report/refs.bib‎
Lines changed: 58 additions & 0 deletions
diff --git a/‎report/sections/1_introduction.tex‎
Lines changed: 7 additions & 9 deletions b/‎report/sections/1_introduction.tex‎
Lines changed: 7 additions & 9 deletions
diff --git a/‎report/sections/2_problem_formulation.tex‎
Lines changed: 37 additions & 7 deletions b/‎report/sections/2_problem_formulation.tex‎
Lines changed: 37 additions & 7 deletions
@@ -4,6 +4,9 @@ This project aims to explore and analyze metaheuristic search-based algorithms f
 ## Proposal
 This is our [idea](./Project%20Proposal/Project%20Proposal%20-%20Fernando%20and%20Kelvin.pdf).
 
+## Report
+This is a [summary](./report/CSI5186_AI_Testing_Project_Report___Fernando__Kelvin.pdf) of our work with valid justifications.
+
 ## Datasets
 * [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html)
     * Object Recognition
 
@@ -12,19 +12,17 @@
  \usepackage{amssymb}
  \usepackage{natbib}
 
- \title{Testing the Effectiveness, Efficiency, and Stability\\of Search-Based Hyperparameter Optimizers }
+\title{Testing the Effectiveness, Convergence, and Stability\\of Search-Based Hyperparameter Optimizers }
 \author{
     Fernando Berti Cruz Nogueira (abert036@uottawa.ca), 
     Kelvin Mock (kmock073@uOttawa.ca)
 }
-\date{October 2025}
 
 \usepackage{fancyhdr}
 \setlength{\headheight}{12.5pt}
 \addtolength{\topmargin}{-0.5pt}
 \fancypagestyle{plain}{%  the preset of fancyhdr 
     \fancyhf{} % clear all header and footer fields
-    \fancyfoot[L]{\thedate}
     \fancyhead[L]{CSI 5186 - AI-enabled Software Verification and Testing, Final Report (Fall 2025)}
 }
 \makeatletter
@@ -49,6 +47,8 @@
 \usepackage{booktabs}
 \usepackage{multirow}
 \usepackage{tabularx}
+\usepackage{tikz}
+\usetikzlibrary{positioning}
 \usepackage[colorlinks=true,citecolor=blue,linkcolor=blue]{hyperref}
 \begin{document}
 \maketitle
@@ -59,17 +59,17 @@
 \end{tabular}
 
 \begin{abstract}
-Hyperparameter optimization is a critical but computationally expensive task for developing effective machine learning models. This report presents an empirical study comparing a Randomized Search (RS) baseline against two representative metaheuristics: a Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). This selection is made to contrast two primary search strategies: global exploration (GA) and local exploitation (PSO). We evaluate the ability of these algorithms to optimize the hyperparameters of three distinct machine learning models (Decision Tree, k-Nearest Neighbors, and a Convolutional Neural Network) on the grayscale CIFAR-10 dataset~\cite{krizhevsky2009learning}. To ensure a fair and balanced assessment we define a composite fitness function. We evaluate the optimizers across three quality attributes: effectiveness (solution quality), efficiency (computational cost), and stability (consistency across runs). The empirical results will be validated using statistical tests to provide statistically grounded conclusions.
+Hyperparameter optimization is a critical but computationally expensive task for developing effective machine learning models. This report presents an empirical study comparing a Randomized Search (RS) baseline against two representative metaheuristics: a Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). This selection is made to contrast two primary search strategies: global exploration (GA) and local exploitation (PSO). We evaluate the ability of these algorithms to optimize the hyperparameters of three distinct machine learning models (Decision Tree, k-Nearest Neighbors, and a Convolutional Neural Network) on the grayscale CIFAR-10 dataset~\cite{krizhevsky2009learning}. To ensure a fair and balanced assessment we define a composite fitness function. We evaluate the optimizers across three quality attributes: effectiveness (solution quality), convergence (improvement over a fixed evaluation budget), and stability (consistency across runs). The empirical results will be validated using statistical tests to provide statistically grounded conclusions.
 \end{abstract}
 
 \input{sections/1_introduction}
 \input{sections/2_problem_formulation}
 \input{sections/3_experiment}
 \input{sections/4_results}
-
+\input{sections/5_limitations}
+\input{sections/6_conclusion}
 
 % BEFORE END
-\clearpage
 \bibliographystyle{plainnat}
 \bibliography{refs}
 
 
@@ -14,3 +14,61 @@ @techreport{krizhevsky2009learning
   year = {2009},
   type = {Technical Report}
 }
+
+@conference{metaheuristics-cookbook,
+    author = {Victoria Bibaeva},
+    booktitle = {2018 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 17–20, 2018, AALBORG, DENMARK},
+    title = {USING METAHEURISTICS FOR HYPER-PARAMETER OPTIMIZATION OF CONVOLUTIONAL NEURAL NETWORKS},
+    year = {2018}
+}
+
+@article{cnn-explained-for-metaheuristics,
+    author = {Sajjad Nematzadeh and Farzad Kiani and Mahsa Torkamanian-Afshar and Nizamettin Aydin},
+    title = {Tuning hyperparameters of machine learning algorithms and deep neural networks using metaheuristics: A bioinformatics study on biomedical and biological cases},
+    journal = {Computational Biology and Chemistry Volume 97, April 2022, 107619},
+    year = {2022},
+    url = {https://doi.org/10.1016/j.compbiolchem.2021.107619}
+}
+
+@article{hpo-experiment-on-cnn,
+    author = {Mohammed Q. Ibrahim and Nazar K. Hussein and David Guinovart and Mohammed Qaraad},
+    title = {Optimizing Convolutional Neural Networks: A Comprehensive Review of Hyperparameter Tuning Through Metaheuristic Algorithms},
+    journal = {International Center for Numerical Methods in Engineering (CIMNE) 2025},
+    year = {2025},
+    url = {https://doi.org/10.1007/s11831-025-10292-x}
+}
+
+@conference{autonomous-vehicle-appl,
+    author = {Raja Ben Abdessalem, Shiva Nejati, Thomas Stifter},
+    booktitle = {ICSE ’18: 40th International Conference on Software Engineering , May 27-June 3, 2018, Gothenburg, Sweden. ACM, New York, NY, USA, 11 pages.},
+    title = {Testing Vision-Based Control Systems Using Learnable
+Evolutionary Algorithms},
+    year = {2018},
+    url = {https://doi.org/10.1145/3180155.3180160},
+}
+
+@misc{dt-scikit,
+    title = {DecisionTreeClassifier},
+    author = {Scikit-Learn},
+    year = {2025},
+    url = {https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html}
+}
+
+@misc{knn-scikit,
+    title = {KNeighborsClassifier},
+    author = {Scikit-Learn},
+    year = {2025},
+    url = {https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html}
+}
+
+@misc{mygithub-drugconsumpML,
+    title = {Drug-Consumption-Machine-Learning-analysis},
+    author = {Kelvin Mock},
+    year = {2024},
+    url = {https://github.com/kmock930/Drug-Consumption-Machine-Learning-analysis}
+}
+
+@misc{explainer-linear,
+    title = {shap.LinearExplainer},
+    url = {https://shap.readthedocs.io/en/latest/generated/shap.LinearExplainer.html}
+}
@@ -1,24 +1,22 @@
 \section{Introduction}
 
-The performance of machine learning models often depends on their hyperparameters' high-level configuration variables like learning rate or batch size that control the training process. Finding the optimal set of these configurations, or Hyperparameter Optimization (HPO), is a significant and resource-intensive bottleneck in model development.
-
-HPO can be framed as a software verification problem. In this context, the model is the software under test and a "defect" being a suboptimal hyperparameter configuration that causes the model to fail its performance specifications, such as by exhibiting high loss, poor generalization or unstable training. HPO thus functions as a automated test drivers, searching the configuration space to find a set of hyperparameters that verifies the model's performance against a pre-defined quality specification.
+The performance of machine learning models relies heavily on hyperparameters—configuration variables like learning rate and batch size that control the training process. Identifying the optimal configuration is a significant bottleneck in model development due to the high computational cost of evaluation. To address this complexity, we frame Hyperparameter Optimization (HPO) as a software verification problem. In this context, the model functions as the “software under test,” where a suboptimal configuration is treated as a “defect” that causes the system to violate its performance specifications (e.g., high loss or instability). HPO therefore acts as an automated test driver, searching the configuration space to verify model performance against defined quality criteria.
 
 \subsection{Evaluation Criteria}
 
-We evaluate the optimizers across three quality attributes, as defined in the project proposal:
+We evaluate the optimizers across 3 quality attributes:
 
 \begin{itemize}
     \item \textbf{Effectiveness}: The quality of the final solution found (i.e., the best fitness score achieved).
-    \item \textbf{Efficiency}: The computational cost required to find a solution, measured in both fitness evaluations and wall-clock time.
-    \item \textbf{Stability (Consistency)}: The consistency and reliability of the algorithm's performance across multiple independent runs.
+    \item \textbf{Convergence}: The rate at which the algorithm improves its best-found solution over the course of the fixed evaluation budget.
+    \item \textbf{Stability}: The consistency and reliability of the algorithm's performance across multiple independent runs (measured by variance).
 \end{itemize}
 
 \subsection{Research Questions}
 
-This report seeks to answer the following research questions from the project proposal:
+This report seeks to answer the following research questions:
 
 \begin{itemize}
-    \item \textbf{RQ1}: How do representative metaheuristic algorithms compare against a randomized search baseline in terms of effectiveness and efficiency when performing HPO prior to training?
-    \item \textbf{RQ2}: What is the difference in performance stability between the selected metaheuristic algorithms and traditional solutions like the randomized search baseline?
+    \item \textbf{RQ1}: How do representative metaheuristic algorithms compare against a randomized search baseline in terms of effectiveness and convergence rates given a fixed evaluation budget?
+    \item \textbf{RQ2}: What is the difference in performance stability between the selected metaheuristic algorithms and the randomized search baseline?
 \end{itemize}
@@ -1,20 +1,50 @@
-
 \section{Problem Formulation}
 
-\subsection{Objective Function}
+\subsection{Representation and Objective Function}
 
-HPO is a black-box optimization problem. The objective function $f(\theta)$, which represents the model's performance for a given hyperparameter configuration $\theta$, presents many challenges: it is computationally expensive to evaluate, it is non-differentiable, and the search space $\Theta$ is often complex and of mixed-types (continuous, discrete, and categorical). These properties make HPO suitable for search-based metaheuristic techniques.
+HPO is a black-box optimization problem. It happens \textbf{prior to} the actual training loop. The problem is represented by arrays of possible values of each parameter type in table \ref{tab:hparam_space}. The objective function $f(\theta)$, which represents the model's performance for a given hyperparameter configuration $\theta$, presents many challenges: it is computationally expensive to evaluate, it is non-differentiable, and the search space $\Theta$ is often complex and of mixed-types (continuous, discrete, and categorical). These properties make HPO suitable for search-based metaheuristic techniques.
 
 \subsection{Algorithm Selection}
 
 \subsubsection{Baseline: Randomized Search}
 
-RS is the standard scientific baseline for HPO. \citet{bergstra2012random} demonstrated empirically that RS is more efficient than Grid Search for HPO, as it does not waste evaluations on unimportant parameters. Therefore, any intelligent algorithm must demonstrate superiority over RS to be considered effective.
+Random Search (RS) is the standard scientific baseline for HPO. \citet{bergstra2012random} demonstrated empirically that RS is more efficient than Grid Search for HPO. Therefore, any intelligent algorithm must demonstrate superiority over RS to be considered effective.
 
-\subsubsection{Genetic Algorithm}
+\subsubsection{Evolutionary Genetic Algorithm}
 
-TODO
+Inspired by Darwinian evolution, the Genetic Algorithm (GA) searches for optimal solutions using \textit{selection}, \textit{crossover}, and \textit{mutation}. We implement a \textbf{Memetic Algorithm} variant, which includes a local search component to escape fitness plateaus. As described in \cite{metaheuristics-cookbook}, a radius-based elitism is applied before crossover to refine the fittest individuals.
 
 \subsubsection{Particle Swarm Optimization}
 
-PSO models a swarm where individuals are strongly influenced by the single best-found solution. This behaviour leads to rapid convergence, often finding a "good-enough" solution quickly. This same strength can also be a weakness, as it may converge prematurely to a suboptimal solution. The swarm can rapidly cluster around the first local optimum it finds, losing diversity and becoming "stuck" before the true global optimum is found.
+PSO models a swarm where individuals are influenced by both their personal best (\texttt{p\_best}) and the global best (\texttt{g\_best}) solutions. The velocity of each particle is updated using inertia weight ($w$) and acceleration coefficients ($c_1, c_2$), balancing exploration and exploitation.
+
+\begin{table}[htbp]
+\centering
+\caption{Optimizer Configuration Parameters}
+\label{tab:algo_params}
+\small
+\begin{tabularx}{\textwidth}{lllX}
+\toprule
+\textbf{Algorithm} & \textbf{Parameter} & \textbf{Value} & \textbf{Description} \\
+\midrule
+\multirow{4}{*}{Genetic Alg.} 
+    & Population & 30 & Number of individuals per generation. \\
+\cmidrule{2-4}
+    & Generations & 10 & Maximum total evolutionary iterations. \\
+\cmidrule{2-4}
+    & Elitism & 50\% & Proportion of population preserved/selected. \\
+\cmidrule{2-4}
+    & Radius & 0.0 & Memetic-local-search radius (0.15 when memetic is enabled; 0 for standard GA runs). \\
+\midrule
+\multirow{4}{*}{PSO}
+    & Particles & 10 & Size of the swarm. \\
+\cmidrule{2-4}
+    & $w$ (Inertia) & 0.5 & Inertia weight controlling velocity retention. \\
+\cmidrule{2-4}
+    & $c_1$ (Cognitive) & 1.5 & Weight for personal best influence. \\
+\cmidrule{2-4}
+    & $c_2$ (Social) & 1.5 & Weight for global best influence. \\
+\bottomrule
+\end{tabularx}
+\end{table}
+