A Homogeneous Transformer Architecture

While the Transformer architecture has made a substantial impact in the field of machine learning, it is unclear what purpose each component serves in the overall architecture. Heterogeneous nonlinear circuits such as multi-layer RELU networks are interleaved with layers of soft-max units. We introd...

Full description

Bibliographic Details
Main Authors: Gan, Yulu, Poggio, Tomaso
Format: Article
Published: Center for Brains, Minds and Machines (CBMM) 2023
Online Access:https://hdl.handle.net/1721.1/152178