Can we observe grokking on modular addition in a toy example?
This is inspired by: https://arxiv.org/abs/2301.05217 but running on a MLP instead of a transformer.
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Can we observe grokking on modular addition in a toy example?
This is inspired by: https://arxiv.org/abs/2301.05217 but running on a MLP instead of a transformer.