Skip to content

Commit 50d24a5

Browse files
committed
Add current fitness evaluation figure
1 parent ba18e15 commit 50d24a5

2 files changed

Lines changed: 13 additions & 2 deletions

File tree

report/figures/evaluations.png

328 KB
Loading

report/sections/4_results.tex

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,12 +9,21 @@ \subsection{RQ1: Effectiveness and Convergence}
99
\begin{figure}[H]
1010
\centering
1111
\includegraphics[width=\textwidth]{./figures/convergence.png}
12-
\caption{Convergence behavior of GA, PSO, and RS across 50 evaluations for all three models.}
12+
\caption{Best fitness convergence behavior of GA, PSO, and RS across 50 evaluations for all three models.}
1313
\label{fig:convergence}
1414
\end{figure}
1515

1616
For the \textbf{Decision Tree} and \textbf{KNN}, the search space was relatively small. Consequently, all three optimizers rapidly converged to near-identical optimal configurations. As shown in the final test performance (Figure \ref{fig:test_perf}), the DT achieved a fitness of $\approx 0.3384$, while the KNN achieved $\approx 0.4308$.
1717

18+
We have also plotted the mean and standard deviation of the current fitness for each $n$ evaluations for each optimizer. As shown in Figure \ref{fig:evaluations}, the current fitness is generally improving over the evaluations, though with varying efficiency.
19+
20+
\begin{figure}[H]
21+
\centering
22+
\includegraphics[width=\textwidth]{./figures/evaluations.png}
23+
\caption{Current fitness behavior of GA, PSO, and RS across 50 evaluations for all three models.}
24+
\label{fig:evaluations}
25+
\end{figure}
26+
1827
For the \textbf{CNN}, which possesses the largest and most complex search space, we observed distinct behaviors. While GA (orange line in Figure \ref{fig:convergence}) started with lower fitness, it showed steady improvement. Random Search (RS), surprisingly, started with high fitness in several runs, likely due to the efficacy of random sampling in high-dimensional spaces where few parameters dominate performance. Ultimately, all algorithms converged to a test performance of approximately $0.77$ (Figure \ref{fig:test_perf}).
1928

2029
\begin{figure}[H]
@@ -65,4 +74,6 @@ \subsection{Statistical Significance}
6574
\item \textbf{Identical Performance (DT):} GA-Standard vs PSO remains $p=1.000$, indicating indistinguishable outcomes on Decision Trees.
6675
\item \textbf{Near Significance (KNN):} PSO vs RS on KNN is $p=0.094$, hinting PSO may modestly outperform RS, but it does not reach $\alpha=0.05$.
6776
\item \textbf{Memetic Variants:} GA-Memetic comparisons (vs GA-Standard, PSO, RS) are all non-significant ($p > 0.05$), showing no measurable improvement over the standard GA under our budget.
68-
\end{itemize}
77+
\end{itemize}
78+
79+
Given the GA population size of $30$ and the strict budget of $50$ evaluations, the algorithm is structurally capped at fewer than three full generations: $30$ evaluations are spent on initialization, leaving only $20$ offspring evaluations (about $0.67$ of a generation). Roughly $60\%$ of the budget is therefore consumed by warm-up sampling, so the GA behaves similarly to Random Search under this constraint. This limited evolutionary pressure helps explain the non-significant differences in Table \ref{tab:wilcoxon}, as crossover and mutation had too few iterations to drive convergence.

0 commit comments

Comments
 (0)