Vision transformers: from semantic segmentation to dense prediction
The emergence of vision transformers (ViTs) in image classification has shifted the methodologies for visual representation learning. In particular, ViTs learn visual representation at full receptive field per layer across all the image patches, in comparison to the increasing receptive fields of CN...
Main Authors: | , , , , , , , , |
---|---|
Format: | Journal article |
Language: | English |
Published: |
Springer
2024
|