Vision transformers: from semantic segmentation to dense prediction

The emergence of vision transformers (ViTs) in image classification has shifted the methodologies for visual representation learning. In particular, ViTs learn visual representation at full receptive field per layer across all the image patches, in comparison to the increasing receptive fields of CN...

Full description

Bibliographic Details
Main Authors: Zhang, L, Lu, J, Zheng, S, Zhao, X, Zhu, X, Fu, Y, Xiang, T, Feng, J, Torr, PHS
Format: Journal article
Language:English
Published: Springer 2024