Transformers Meet Small Datasets

The research and application areas of transformers have been extensively enlarged due to the success of vision transformers (ViTs). However, due to the lack of local content acquisition capabilities, the pure transformer architectures cannot be trained directly on small datasets. In this work, we fi...

Full description

Bibliographic Details
Main Authors:	Ran Shao, Xiao-Jun Bi
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Convolutional neural networks small datasets transformer vision transformer
Online Access:	https://ieeexplore.ieee.org/document/9944625/

_version_	1798018966377988096
author	Ran Shao Xiao-Jun Bi
author_facet	Ran Shao Xiao-Jun Bi
author_sort	Ran Shao
collection	DOAJ
description	The research and application areas of transformers have been extensively enlarged due to the success of vision transformers (ViTs). However, due to the lack of local content acquisition capabilities, the pure transformer architectures cannot be trained directly on small datasets. In this work, we first propose a new hybrid model by combining the transformer and convolution neural network (CNN). The proposed model improves the classification ability on small datasets. This is accomplished by introducing more convolution operations in the transformer’s two core sections: 1) Instead of the original multi-head attention mechanism, we design a convolutional parameter sharing multi-head attention (CPSA) block that incorporates the convolutional parameter sharing projection in the attention mechanism; 2) the feed-forward network in each transformer encoder block is replaced with a local feed-forward network (LFFN) block that introduces a sandglass block with more depth-wise convolutions to provide more locality to the transformers. We achieve state-of-the-art results when training from scratch on 4 small datasets as compared with the transformers and CNNs without extensive computing resources and auxiliary training. The proposed strategy opens up new paths for the application of transformers on small datasets.
first_indexed	2024-04-11T16:31:55Z
format	Article
id	doaj.art-d3e57a62f0cf4470a53774a0c2b282b2
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-11T16:31:55Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-d3e57a62f0cf4470a53774a0c2b282b22022-12-22T04:13:59ZengIEEEIEEE Access2169-35362022-01-011011845411846410.1109/ACCESS.2022.32211389944625Transformers Meet Small DatasetsRan Shao0https://orcid.org/0000-0002-8462-8721Xiao-Jun Bi1https://orcid.org/0000-0002-5382-1000College of Information and Communication Engineering, Harbin Engineering University, Harbin, ChinaDepartment of Information Engineering, Minzu University of China, Beijing, ChinaThe research and application areas of transformers have been extensively enlarged due to the success of vision transformers (ViTs). However, due to the lack of local content acquisition capabilities, the pure transformer architectures cannot be trained directly on small datasets. In this work, we first propose a new hybrid model by combining the transformer and convolution neural network (CNN). The proposed model improves the classification ability on small datasets. This is accomplished by introducing more convolution operations in the transformer’s two core sections: 1) Instead of the original multi-head attention mechanism, we design a convolutional parameter sharing multi-head attention (CPSA) block that incorporates the convolutional parameter sharing projection in the attention mechanism; 2) the feed-forward network in each transformer encoder block is replaced with a local feed-forward network (LFFN) block that introduces a sandglass block with more depth-wise convolutions to provide more locality to the transformers. We achieve state-of-the-art results when training from scratch on 4 small datasets as compared with the transformers and CNNs without extensive computing resources and auxiliary training. The proposed strategy opens up new paths for the application of transformers on small datasets.https://ieeexplore.ieee.org/document/9944625/Convolutional neural networkssmall datasetstransformervision transformer
spellingShingle	Ran Shao Xiao-Jun Bi Transformers Meet Small Datasets IEEE Access Convolutional neural networks small datasets transformer vision transformer
title	Transformers Meet Small Datasets
title_full	Transformers Meet Small Datasets
title_fullStr	Transformers Meet Small Datasets
title_full_unstemmed	Transformers Meet Small Datasets
title_short	Transformers Meet Small Datasets
title_sort	transformers meet small datasets
topic	Convolutional neural networks small datasets transformer vision transformer
url	https://ieeexplore.ieee.org/document/9944625/
work_keys_str_mv	AT ranshao transformersmeetsmalldatasets AT xiaojunbi transformersmeetsmalldatasets

Transformers Meet Small Datasets

Similar Items