Searching for Efficient Multi-Stage Vision Transformers

Vision Transformer (ViT) demonstrates that Transformer for natural language processing can be applied to image classification tasks and result in comparable performance to convolutional neural networks (CNN), which have been studied in computer vision for years. This naturally raises the question of...

Full description

Bibliographic Details
Main Author: Liao, Yi-Lun
Other Authors: Sze, Vivienne
Format: Thesis
Published: Massachusetts Institute of Technology 2022
Online Access:https://hdl.handle.net/1721.1/140187