Continual Learning: Comparing Anti-Forgetting Methods

Note: For a detailed in-depth analysis, please read the full article PDF: comparing_antiforgetting_methods_article.pdf

This repository explores and compares various continual learning strategies designed to mitigate catastrophic forgetting in neural networks. When models learn new tasks sequentially, they often drastically lose performance on previously learned tasks. The methods analyzed here aim to preserve knowledge of old tasks while actively learning new ones.

Approach & Methods

The study evaluates several popular continual learning algorithms:

Naive Fine-Tuning (finetune): The baseline approach where the model is simply trained on new tasks sequentially without any specific mechanisms to prevent forgetting.
Elastic Weight Consolidation (ewc): A regularization-based method that constrains the updates of weights deemed critical to previously learned tasks.
Learning without Forgetting (lwf): A distillation-based method that uses knowledge distillation from the old model to the new one to retain previous knowledge.
Experience Replay (replay): A rehearsal-based approach that stores a small buffer of samples from past tasks and interleaves them with the new task data during training.
Synaptic Intelligence (si): A parameter-regularization method that continuously updates importance measures for each synapse (weight) throughout the entire training trajectory.
Gradient Episodic Memory (gem): A rehearsal-based approach that adds a constraint on the gradients to ensure that updates for the current task do not increase the loss on previously learned tasks stored in an episodic memory.

Metrics Evaluated

The models are compared based on standard continual learning metrics:

Average Accuracy ($A_{bar}$): The mean accuracy across all tasks after training on the entire sequence.
Average Forgetting ($F_{bar}$): The average decrease in performance on previous tasks as new tasks are learned.
Backward Transfer (BWT): The influence that learning a new task has on the performance on previous tasks (positive BWT implies improvement on older tasks; negative BWT implies catastrophic forgetting).
Forward Transfer (FWT): The influence that learning previous tasks has on the learning of a new task.

Results

A comprehensive evaluation arrayed across 5 sequential tasks shows significant variations in model capability to counteract catastrophic forgetting. Learning Without Forgetting (LwF) and Experience Replay showed high average accuracies and low average forgetting.

Summary Metrics

Method	Average Accuracy ($A_{bar}$)	Average Forgetting ($F_{bar}$)	Backward Transfer (BWT)	Forward Transfer (FWT)
Fine-Tuning	66.65%	35.25%	-35.25%	-1.65%
EWC	84.62%	4.08%	+2.21%	+0.64%
LwF	93.04%	1.92%	-1.92%	-4.31%
Replay	88.80%	6.81%	-6.81%	-0.44%
SI	71.73%	26.60%	-26.60%	+0.26%
GEM	86.84%	8.73%	-8.73%	-1.46%

plots below depict the learning trajectory and specific matrix outcomes to visually represent metric drop-offs per task:

Learning Trajectory (Performance progression over time)

Average Forgetting

Final Accuracy Per Task

Conclusion

Naive fine-tuning suffers from severe catastrophic forgetting (highest $F_{bar}$ at 35.25%). Modern anti-forgetting strategies, notably LwF, EWC, and Replay, successfully mitigate this. Learning without Forgetting (LwF) achieved the highest overall Average Accuracy (93.04%) with the lowest Average Forgetting (1.92%) on the evaluated benchmark.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
figures		figures
LICENSE		LICENSE
README.md		README.md
comparing_antiforgetting_methods_article.pdf		comparing_antiforgetting_methods_article.pdf
comparing_antiforgetting_methods_notebook.ipynb		comparing_antiforgetting_methods_notebook.ipynb
results.json		results.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Continual Learning: Comparing Anti-Forgetting Methods

Approach & Methods

Metrics Evaluated

Results

Summary Metrics

plots below depict the learning trajectory and specific matrix outcomes to visually represent metric drop-offs per task:

Learning Trajectory (Performance progression over time)

Average Forgetting

Final Accuracy Per Task

Conclusion

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Continual Learning: Comparing Anti-Forgetting Methods

Approach & Methods

Metrics Evaluated

Results

Summary Metrics

plots below depict the learning trajectory and specific matrix outcomes to visually represent metric drop-offs per task:

Learning Trajectory (Performance progression over time)

Average Forgetting

Final Accuracy Per Task

Conclusion

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages