Searching for Efficient Multi-Stage Vision Transformers
Vision Transformer (ViT) demonstrates that Transformer for natural language processing can be applied to image classification tasks and result in comparable performance to convolutional neural networks (CNN), which have been studied in computer vision for years. This naturally raises the question of...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2022
|
Online Access: | https://hdl.handle.net/1721.1/140187 |