A Homogeneous Transformer Architecture
While the Transformer architecture has made a substantial impact in the field of machine learning, it is unclear what purpose each component serves in the overall architecture. Heterogeneous nonlinear circuits such as multi-layer RELU networks are interleaved with layers of soft-max units. We introd...
Main Authors: | , |
---|---|
Format: | Article |
Published: |
Center for Brains, Minds and Machines (CBMM)
2023
|
Online Access: | https://hdl.handle.net/1721.1/152178 |