A Homogeneous Transformer Architecture

While the Transformer architecture has made a substantial impact in the field of machine learning, it is unclear what purpose each component serves in the overall architecture. Heterogeneous nonlinear circuits such as multi-layer RELU networks are interleaved with layers of soft-max units. We introd...

Full description

Bibliographic Details
Main Authors: Gan, Yulu, Poggio, Tomaso
Format: Article
Published: Center for Brains, Minds and Machines (CBMM) 2023
Online Access:https://hdl.handle.net/1721.1/152178
_version_ 1824458035981975552
author Gan, Yulu
Poggio, Tomaso
author_facet Gan, Yulu
Poggio, Tomaso
author_sort Gan, Yulu
collection MIT
description While the Transformer architecture has made a substantial impact in the field of machine learning, it is unclear what purpose each component serves in the overall architecture. Heterogeneous nonlinear circuits such as multi-layer RELU networks are interleaved with layers of soft-max units. We introduce here a homogeneous architecture based on Hyper Radial Basis Function (HyperBF) units. Evalua- tions on CIFAR10, CIFAR100, and Tiny ImageNet demonstrate a performance comparable to standard vision transformers.
first_indexed 2024-09-23T10:55:11Z
format Article
id mit-1721.1/152178
institution Massachusetts Institute of Technology
last_indexed 2025-02-19T04:19:30Z
publishDate 2023
publisher Center for Brains, Minds and Machines (CBMM)
record_format dspace
spelling mit-1721.1/1521782024-12-21T05:57:35Z A Homogeneous Transformer Architecture Gan, Yulu Poggio, Tomaso While the Transformer architecture has made a substantial impact in the field of machine learning, it is unclear what purpose each component serves in the overall architecture. Heterogeneous nonlinear circuits such as multi-layer RELU networks are interleaved with layers of soft-max units. We introduce here a homogeneous architecture based on Hyper Radial Basis Function (HyperBF) units. Evalua- tions on CIFAR10, CIFAR100, and Tiny ImageNet demonstrate a performance comparable to standard vision transformers. This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216. 2023-09-19T15:59:39Z 2023-09-19T15:59:39Z 2023-09-18 Article Technical Report Working Paper https://hdl.handle.net/1721.1/152178 CBMM Memo;143 application/pdf application/pdf Center for Brains, Minds and Machines (CBMM)
spellingShingle Gan, Yulu
Poggio, Tomaso
A Homogeneous Transformer Architecture
title A Homogeneous Transformer Architecture
title_full A Homogeneous Transformer Architecture
title_fullStr A Homogeneous Transformer Architecture
title_full_unstemmed A Homogeneous Transformer Architecture
title_short A Homogeneous Transformer Architecture
title_sort homogeneous transformer architecture
url https://hdl.handle.net/1721.1/152178
work_keys_str_mv AT ganyulu ahomogeneoustransformerarchitecture
AT poggiotomaso ahomogeneoustransformerarchitecture
AT ganyulu homogeneoustransformerarchitecture
AT poggiotomaso homogeneoustransformerarchitecture