Updated Report (Latex)

kmock930 · kmock930 · commit 87f837fa390d · 2025-12-11T16:46:44.000-05:00
diff --git a/report/.gitignore b/report/.gitignore
@@ -12,6 +12,3 @@
 *.fdb_latexmk
 *.fls
 
-# PDF output (generated from .tex)
-*.pdf
-
diff --git a/report/sections/3_experiment.tex b/report/sections/3_experiment.tex
@@ -22,7 +22,7 @@ \subsubsection{Models}
 
 \paragraph{Decision Tree (DT)} DT is the simplest model in nature. It is tree-based which suggests making predictions based on binary predicates. Its architecture is not complex for training and it is also highly explainable to non-technical persons. It is also widely used in real-world production-grade systems like autonomous vehicles \cite{autonomous-vehicle-appl}. Given its simplicity and popularity, we start our analysis on exploring what parameters minimally have to be tuned for the simplest model, and how it performs during tuning with metaheuristics. For simplicity, a prebuilt structure from \cite{dt-scikit} is used.
 
-\paragraph{K-Nearest Neighbors (KNN)} We ultilize \textit{KNeighborsClassifier} from \cite{knn-scikit} which predicts based on the class of a nearest neighbor among existing data instances. From the perspective of explainability, a prior work \cite{mygithub-drugconsumpML} suggests that, depending on datasets, sometimes a KNN classifier could be linear-based, but in the case of an image dataset, KNN in our experiment are predicting from highly dimensional image arrays, whose predictions need to be generalized by a kernel-based explainer. The model's architecture itself is not complex but the dataset involved could be a bit heavier training task. We used it as an alternative type in our experiment.
+\paragraph{K-Nearest Neighbors (KNN)} We ultilize \textit{KNeighborsClassifier} from \cite{knn-scikit} which is primarily kernel-based and predicts based on the class of a nearest neighbor among existing data instances. From the perspective of explainability, a prior work \cite{mygithub-drugconsumpML} suggests that, depending on datasets, sometimes a KNN classifier could be linear-based, but in the case of an image dataset, KNN in our experiment are predicting from highly dimensional image arrays, whose predictions need to be generalized by a kernel-based explainer. The model's architecture itself is not complex but the dataset involved could be a bit heavier training task. We used it as an alternative type in our experiment.
 
 \paragraph{Convolutional Neural Network (CNN)} While the above models might exhibit a simpler architecture, CNN, on the other hand, is a common deep learning architecture in practical works, which helps learning image recognition tasks more efficiently. It is a fully-connected 3-layered neural network with batch normalization, which takes operations known as "convolutions". Each of which utilizes a subset of pixels, known as a "kernel", or a "filter", iteratively learn those patterns. Custom neural networks generally, in the real-world, involve far more hyper-parameters during their training, and thus, tuning them is computationally much heavier than those prebuilt surrogate models. However, bad hyper-parameters could also increase the resulting error rate of the model \cite{metaheuristics-cookbook}. Therefore, a suitable metaheuristic search here comes with an important role to help determine the best set of hyper-parameters with fewer resources than an exhaustive search. Figure~\ref{fig:cnn_arch} summarizes the CNN backbone used in our experiments.
 
@@ -152,7 +152,7 @@ \subsubsection{Justification}
 
 \paragraph{Accuracy} Accuracy is the simplest metric, which only measures the proportion of correctly classified samples (regardless of their labels) over the entire dataset. Thus, it might be too naive to inform us the insights that we could tell from other metrics.
 
-\paragraph{Micro F1} As suggested by the formula of the macro F1 score above, micro F1 is just considering the precision-recall balance within 1 class. The dataset that we are in consideration consists of 10, relatively large number of classes, and so, this could even be more naive and redundant in the above macro F1's consideration.
+\paragraph{Micro F1} As suggested by the idea of the macro F1 score above, micro F1 is just considering the precision-recall balance within 1 class. The dataset that we are in consideration consists of 10, relatively large number of classes, and so, this could even be more naive and redundant in the above macro F1's consideration.
 
 \begin{table}[htbp]
 \centering
diff --git a/report/sections/4_results.tex b/report/sections/4_results.tex
@@ -24,7 +24,7 @@ \subsection{RQ1: Effectiveness and Convergence}
 
 \subsection{RQ2: Stability}
 
-The stability of final solutions, shown by box plots of fitness across 10 runs (Figure \ref{fig:boxplot}), was high for all models. Variance was minimal for DT and KNN. For CNN, all optimizers produced similar interquartile ranges (approximately 0.79 to 0.83), with minor differences in outlier counts (GA: 1, PSO: 2, RS: 1).
+The stability of final solutions, shown by box plots of fitness across 10 runs (Figure \ref{fig:boxplot}), was high for all models. Variance was minimal for DT and KNN. For CNN, all optimizers produced similar inter-quartile ranges (approximately 0.79 to 0.83), with minor differences in outlier counts (GA: 1, PSO: 2, RS: 1).
 
 \begin{figure}[H]
     \centering
diff --git a/report/sections/6_conclusion.tex b/report/sections/6_conclusion.tex
@@ -2,14 +2,16 @@ \section{Conclusion}
 
 This study evaluated metaheuristic optimizers (GA, PSO) against a Randomized Search baseline under a strict budget of 50 evaluations. No significant performance difference was found between methods across Decision Tree, KNN, or CNN models ($p > 0.05$), with all reaching similar fitness plateaus.
 
-The result is explained by initialization overhead: with a population of 30, GA used 60\% of its budget on initial sampling, leaving too few evaluations for evolutionary operators to yield improvement. In such micro-budget regimes (budget $< 2 \times$ population), population-based methods behave similarly to Random Search.
+\paragraph{Addressing the Limitation} The result is explained by initialization overhead: with a population of 30, GA used 60\% of its budget on initial sampling, leaving too few evaluations for evolutionary operators to yield improvement. In such micro-budget regimes (budget $< 2 \times$ population), population-based methods behave similarly to Random Search.
 
 Therefore, for hyperparameter optimization with very limited evaluations, the added complexity of metaheuristics like GA and PSO is not justified. Random Search proved equally effective baseline under these constraints.
 
 Future work should:
 \begin{itemize}
     \item Increase evaluation budgets to allow amortization of initialization costs.
-    \item Explore adaptive or budget-aware variants of GA and PSO.
+    \item Explore adaptive or budget-aware variants of GA and PSO \cite{hpo-experiment-on-cnn}.
     \item Extend experiments to broader datasets and hyperparameter spaces.
+    \item Design systematic testing on larger real-world datasets, to show the applicability and generalizability in real-world problems \cite{hpo-experiment-on-cnn}.
     \item Use larger run counts to improve statistical power.
+    \item Experiment hybrid metaheuristics or on hybrid-based models, like a CNN with \textit{XGBoost},  \cite{hpo-experiment-on-cnn}, combining the advantages and balancing the tradeoffs from traditional ones like PSO and GA.
 \end{itemize}