Improved Image Classification With Token Fusion

In this paper, we propose a method to improve image classification performance using the fusion of CNN and transformer structure. In the case of CNN, information about a local area on an image can be extracted well, but global information extraction is limited. On the other hand, the transformer has...

Full description

Bibliographic Details
Main Authors: Keong-Hun Choi, Jin-Woo Kim, Yao Wang, Jong-Eun Ha
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10171338/
_version_ 1797782860531236864
author Keong-Hun Choi
Jin-Woo Kim
Yao Wang
Jong-Eun Ha
author_facet Keong-Hun Choi
Jin-Woo Kim
Yao Wang
Jong-Eun Ha
author_sort Keong-Hun Choi
collection DOAJ
description In this paper, we propose a method to improve image classification performance using the fusion of CNN and transformer structure. In the case of CNN, information about a local area on an image can be extracted well, but global information extraction is limited. On the other hand, the transformer has an advantage in global information extraction, but it requires much memory compared to CNN. We apply CNN on an image and consider the feature vector of each pixel on the resulting feature map by CNN as a token. At the same time, the image is divided into patches, and each patch is considered a token, like a transformer. Tokens by CNN and transformer have advantages in extracting local and global information, respectively. We assume that the combination of these two types of tokens will have an improved characteristic, and we show it through experiments. We propose three methods to fuse tokens having different characteristics: (1) late token fusion with parallel structure, (2) early token fusion (3) token fusion in layer-by-layer. The proposed method shows the best classification performance in experiments using ImageNet-1K.
first_indexed 2024-03-13T00:17:55Z
format Article
id doaj.art-f09dd9249dd447f39d73ee2547e94916
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-13T00:17:55Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-f09dd9249dd447f39d73ee2547e949162023-07-11T23:00:49ZengIEEEIEEE Access2169-35362023-01-0111674606746710.1109/ACCESS.2023.329159710171338Improved Image Classification With Token FusionKeong-Hun Choi0Jin-Woo Kim1Yao Wang2Jong-Eun Ha3https://orcid.org/0000-0002-4144-1000Graduate School of Automotive Engineering, Seoul National University of Science and Technology, Seoul, South KoreaGraduate School of Automotive Engineering, Seoul National University of Science and Technology, Seoul, South KoreaGraduate School of Automotive Engineering, Seoul National University of Science and Technology, Seoul, South KoreaDepartment of Mechanical and Automotive Engineering, Seoul National University of Science and Technology, Seoul, South KoreaIn this paper, we propose a method to improve image classification performance using the fusion of CNN and transformer structure. In the case of CNN, information about a local area on an image can be extracted well, but global information extraction is limited. On the other hand, the transformer has an advantage in global information extraction, but it requires much memory compared to CNN. We apply CNN on an image and consider the feature vector of each pixel on the resulting feature map by CNN as a token. At the same time, the image is divided into patches, and each patch is considered a token, like a transformer. Tokens by CNN and transformer have advantages in extracting local and global information, respectively. We assume that the combination of these two types of tokens will have an improved characteristic, and we show it through experiments. We propose three methods to fuse tokens having different characteristics: (1) late token fusion with parallel structure, (2) early token fusion (3) token fusion in layer-by-layer. The proposed method shows the best classification performance in experiments using ImageNet-1K.https://ieeexplore.ieee.org/document/10171338/Image classificationtransformerconvolutional neural networksdeep learning
spellingShingle Keong-Hun Choi
Jin-Woo Kim
Yao Wang
Jong-Eun Ha
Improved Image Classification With Token Fusion
IEEE Access
Image classification
transformer
convolutional neural networks
deep learning
title Improved Image Classification With Token Fusion
title_full Improved Image Classification With Token Fusion
title_fullStr Improved Image Classification With Token Fusion
title_full_unstemmed Improved Image Classification With Token Fusion
title_short Improved Image Classification With Token Fusion
title_sort improved image classification with token fusion
topic Image classification
transformer
convolutional neural networks
deep learning
url https://ieeexplore.ieee.org/document/10171338/
work_keys_str_mv AT keonghunchoi improvedimageclassificationwithtokenfusion
AT jinwookim improvedimageclassificationwithtokenfusion
AT yaowang improvedimageclassificationwithtokenfusion
AT jongeunha improvedimageclassificationwithtokenfusion