EmbedFormer: Embedded Depth-Wise Convolution Layer for Token Mixing

Visual Transformers (ViTs) have shown impressive performance due to their powerful coding ability to catch spatial and channel information. MetaFormer gives us a general architecture of transformers consisting of a token mixer and a channel mixer through which we can generally understand how transfo...

Full description

Bibliographic Details
Main Authors:	Zeji Wang, Xiaowei He, Yi Li, Qinliang Chuai
Format:	Article
Language:	English
Published:	MDPI AG 2022-12-01
Series:	Sensors
Subjects:	deep learning computer vision CNN vision transformer
Online Access:	https://www.mdpi.com/1424-8220/22/24/9854

Internet

https://www.mdpi.com/1424-8220/22/24/9854

EmbedFormer: Embedded Depth-Wise Convolution Layer for Token Mixing

Internet

Similar Items