Transformers for computer vision

Transformer models were initially introduced on natural language tasks based on the self-attention mechanism. They require minimal inductive biases on design and can be applied as individual processing layers in network design in network design. In recent years, transformer models are applied to pop...

Full description

Bibliographic Details
Main Author:	Deng, Yaojun
Other Authors:	Wang Lipo
Format:	Thesis-Master by Coursework
Language:	English
Published:	Nanyang Technological University 2022
Subjects:	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
Online Access:	https://hdl.handle.net/10356/154659

_version_	1826122471765442560
author	Deng, Yaojun
author2	Wang Lipo
author_facet	Wang Lipo Deng, Yaojun
author_sort	Deng, Yaojun
collection	NTU
description	Transformer models were initially introduced on natural language tasks based on the self-attention mechanism. They require minimal inductive biases on design and can be applied as individual processing layers in network design in network design. In recent years, transformer models are applied to popular Computer Vision (CV) tasks and led to significant progress. Previous surveys introduced applications of transformers on different tasks (e.g., object detection, activity recognition, and image enhancement). In this dissertation, we focus on image classification and introduce several outstanding and representative improved vision transformer models. We conduct comparison and simulation between transformer models and several representative convolution neural network (CNN) models to illustrate the advantages and limitations of vision transformers in Computer Vision (CV) tasks.
first_indexed	2024-10-01T05:49:01Z
format	Thesis-Master by Coursework
id	ntu-10356/154659
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T05:49:01Z
publishDate	2022
publisher	Nanyang Technological University
record_format	dspace
spelling	ntu-10356/1546592023-07-04T16:38:15Z Transformers for computer vision Deng, Yaojun Wang Lipo School of Electrical and Electronic Engineering ELPWang@ntu.edu.sg Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Transformer models were initially introduced on natural language tasks based on the self-attention mechanism. They require minimal inductive biases on design and can be applied as individual processing layers in network design in network design. In recent years, transformer models are applied to popular Computer Vision (CV) tasks and led to significant progress. Previous surveys introduced applications of transformers on different tasks (e.g., object detection, activity recognition, and image enhancement). In this dissertation, we focus on image classification and introduce several outstanding and representative improved vision transformer models. We conduct comparison and simulation between transformer models and several representative convolution neural network (CNN) models to illustrate the advantages and limitations of vision transformers in Computer Vision (CV) tasks. Master of Science (Signal Processing) 2022-01-03T07:35:26Z 2022-01-03T07:35:26Z 2021 Thesis-Master by Coursework Deng, Y. (2021). Transformers for computer vision. Master's thesis, Nanyang Technological University, Singapore. https://hdl.handle.net/10356/154659 https://hdl.handle.net/10356/154659 en ISM-DISS-02493 application/pdf Nanyang Technological University
spellingShingle	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision Deng, Yaojun Transformers for computer vision
title	Transformers for computer vision
title_full	Transformers for computer vision
title_fullStr	Transformers for computer vision
title_full_unstemmed	Transformers for computer vision
title_short	Transformers for computer vision
title_sort	transformers for computer vision
topic	Engineering::Computer science and engineering::Computing methodologies::Image processing and computer vision
url	https://hdl.handle.net/10356/154659
work_keys_str_mv	AT dengyaojun transformersforcomputervision

Transformers for computer vision

Similar Items