Beyond Memorization: Exploring the Dynamics of Grokking in Sparse Neural Networks
In the domain of machine learning, "grokking" is a phenomenon where neural network models demonstrate a sudden improvement in generalization, distinct from traditional learning phases, long after the initial training appears complete. This behavior was first identified by Power et al. (202...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2024
|
Online Access: | https://hdl.handle.net/1721.1/156751 |