Skip to content

soualahmohammedzakaria/Anti-Forgetting-Benchmark

Repository files navigation

Continual Learning: Comparing Anti-Forgetting Methods

Note: For a detailed in-depth analysis, please read the full article PDF: comparing_antiforgetting_methods_article.pdf

This repository explores and compares various continual learning strategies designed to mitigate catastrophic forgetting in neural networks. When models learn new tasks sequentially, they often drastically lose performance on previously learned tasks. The methods analyzed here aim to preserve knowledge of old tasks while actively learning new ones.

Approach & Methods

The study evaluates several popular continual learning algorithms:

  1. Naive Fine-Tuning (finetune): The baseline approach where the model is simply trained on new tasks sequentially without any specific mechanisms to prevent forgetting.
  2. Elastic Weight Consolidation (ewc): A regularization-based method that constrains the updates of weights deemed critical to previously learned tasks.
  3. Learning without Forgetting (lwf): A distillation-based method that uses knowledge distillation from the old model to the new one to retain previous knowledge.
  4. Experience Replay (replay): A rehearsal-based approach that stores a small buffer of samples from past tasks and interleaves them with the new task data during training.
  5. Synaptic Intelligence (si): A parameter-regularization method that continuously updates importance measures for each synapse (weight) throughout the entire training trajectory.
  6. Gradient Episodic Memory (gem): A rehearsal-based approach that adds a constraint on the gradients to ensure that updates for the current task do not increase the loss on previously learned tasks stored in an episodic memory.

Metrics Evaluated

The models are compared based on standard continual learning metrics:

  • Average Accuracy ($A_{bar}$): The mean accuracy across all tasks after training on the entire sequence.
  • Average Forgetting ($F_{bar}$): The average decrease in performance on previous tasks as new tasks are learned.
  • Backward Transfer (BWT): The influence that learning a new task has on the performance on previous tasks (positive BWT implies improvement on older tasks; negative BWT implies catastrophic forgetting).
  • Forward Transfer (FWT): The influence that learning previous tasks has on the learning of a new task.

Results

A comprehensive evaluation arrayed across 5 sequential tasks shows significant variations in model capability to counteract catastrophic forgetting. Learning Without Forgetting (LwF) and Experience Replay showed high average accuracies and low average forgetting.

Summary Metrics

Method Average Accuracy ($A_{bar}$) Average Forgetting ($F_{bar}$) Backward Transfer (BWT) Forward Transfer (FWT)
Fine-Tuning 66.65% 35.25% -35.25% -1.65%
EWC 84.62% 4.08% +2.21% +0.64%
LwF 93.04% 1.92% -1.92% -4.31%
Replay 88.80% 6.81% -6.81% -0.44%
SI 71.73% 26.60% -26.60% +0.26%
GEM 86.84% 8.73% -8.73% -1.46%

plots below depict the learning trajectory and specific matrix outcomes to visually represent metric drop-offs per task:

Learning Trajectory (Performance progression over time)

Learning Trajectory

Average Forgetting

Average Forgetting

Final Accuracy Per Task

Final Accuracy Per Task

Conclusion

Naive fine-tuning suffers from severe catastrophic forgetting (highest $F_{bar}$ at 35.25%). Modern anti-forgetting strategies, notably LwF, EWC, and Replay, successfully mitigate this. Learning without Forgetting (LwF) achieved the highest overall Average Accuracy (93.04%) with the lowest Average Forgetting (1.92%) on the evaluated benchmark.

About

A comparative evaluation of continual learning strategies to mitigate catastrophic forgetting in neural networks.

Topics

Resources

License

Stars

Watchers

Forks

Contributors