A Homogeneous Transformer Architecture
While the Transformer architecture has made a substantial impact in the field of machine learning, it is unclear what purpose each component serves in the overall architecture. Heterogeneous nonlinear circuits such as multi-layer RELU networks are interleaved with layers of soft-max units. We introd...
Main Authors: | , |
---|---|
Format: | Article |
Published: |
Center for Brains, Minds and Machines (CBMM)
2023
|
Online Access: | https://hdl.handle.net/1721.1/152178 |
_version_ | 1824458035981975552 |
---|---|
author | Gan, Yulu Poggio, Tomaso |
author_facet | Gan, Yulu Poggio, Tomaso |
author_sort | Gan, Yulu |
collection | MIT |
description | While the Transformer architecture has made a substantial impact in the field of machine learning, it is unclear what purpose each component serves in the overall architecture. Heterogeneous nonlinear circuits such as multi-layer RELU networks are interleaved with layers of soft-max units. We introduce here a homogeneous architecture based on Hyper Radial Basis Function (HyperBF) units. Evalua- tions on CIFAR10, CIFAR100, and Tiny ImageNet demonstrate a performance comparable to standard vision transformers. |
first_indexed | 2024-09-23T10:55:11Z |
format | Article |
id | mit-1721.1/152178 |
institution | Massachusetts Institute of Technology |
last_indexed | 2025-02-19T04:19:30Z |
publishDate | 2023 |
publisher | Center for Brains, Minds and Machines (CBMM) |
record_format | dspace |
spelling | mit-1721.1/1521782024-12-21T05:57:35Z A Homogeneous Transformer Architecture Gan, Yulu Poggio, Tomaso While the Transformer architecture has made a substantial impact in the field of machine learning, it is unclear what purpose each component serves in the overall architecture. Heterogeneous nonlinear circuits such as multi-layer RELU networks are interleaved with layers of soft-max units. We introduce here a homogeneous architecture based on Hyper Radial Basis Function (HyperBF) units. Evalua- tions on CIFAR10, CIFAR100, and Tiny ImageNet demonstrate a performance comparable to standard vision transformers. This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. 2023-09-19T15:59:39Z 2023-09-19T15:59:39Z 2023-09-18 Article Technical Report Working Paper https://hdl.handle.net/1721.1/152178 CBMM Memo;143 application/pdf application/pdf Center for Brains, Minds and Machines (CBMM) |
spellingShingle | Gan, Yulu Poggio, Tomaso A Homogeneous Transformer Architecture |
title | A Homogeneous Transformer Architecture |
title_full | A Homogeneous Transformer Architecture |
title_fullStr | A Homogeneous Transformer Architecture |
title_full_unstemmed | A Homogeneous Transformer Architecture |
title_short | A Homogeneous Transformer Architecture |
title_sort | homogeneous transformer architecture |
url | https://hdl.handle.net/1721.1/152178 |
work_keys_str_mv | AT ganyulu ahomogeneoustransformerarchitecture AT poggiotomaso ahomogeneoustransformerarchitecture AT ganyulu homogeneoustransformerarchitecture AT poggiotomaso homogeneoustransformerarchitecture |