EmbedFormer: Embedded Depth-Wise Convolution Layer for Token Mixing

Visual Transformers (ViTs) have shown impressive performance due to their powerful coding ability to catch spatial and channel information. MetaFormer gives us a general architecture of transformers consisting of a token mixer and a channel mixer through which we can generally understand how transfo...

Full description

Bibliographic Details
Main Authors: Zeji Wang, Xiaowei He, Yi Li, Qinliang Chuai
Format: Article
Language:English
Published: MDPI AG 2022-12-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/22/24/9854