Improved Image Classification With Token Fusion
In this paper, we propose a method to improve image classification performance using the fusion of CNN and transformer structure. In the case of CNN, information about a local area on an image can be extracted well, but global information extraction is limited. On the other hand, the transformer has...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10171338/ |
_version_ | 1797782860531236864 |
---|---|
author | Keong-Hun Choi Jin-Woo Kim Yao Wang Jong-Eun Ha |
author_facet | Keong-Hun Choi Jin-Woo Kim Yao Wang Jong-Eun Ha |
author_sort | Keong-Hun Choi |
collection | DOAJ |
description | In this paper, we propose a method to improve image classification performance using the fusion of CNN and transformer structure. In the case of CNN, information about a local area on an image can be extracted well, but global information extraction is limited. On the other hand, the transformer has an advantage in global information extraction, but it requires much memory compared to CNN. We apply CNN on an image and consider the feature vector of each pixel on the resulting feature map by CNN as a token. At the same time, the image is divided into patches, and each patch is considered a token, like a transformer. Tokens by CNN and transformer have advantages in extracting local and global information, respectively. We assume that the combination of these two types of tokens will have an improved characteristic, and we show it through experiments. We propose three methods to fuse tokens having different characteristics: (1) late token fusion with parallel structure, (2) early token fusion (3) token fusion in layer-by-layer. The proposed method shows the best classification performance in experiments using ImageNet-1K. |
first_indexed | 2024-03-13T00:17:55Z |
format | Article |
id | doaj.art-f09dd9249dd447f39d73ee2547e94916 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-13T00:17:55Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-f09dd9249dd447f39d73ee2547e949162023-07-11T23:00:49ZengIEEEIEEE Access2169-35362023-01-0111674606746710.1109/ACCESS.2023.329159710171338Improved Image Classification With Token FusionKeong-Hun Choi0Jin-Woo Kim1Yao Wang2Jong-Eun Ha3https://orcid.org/0000-0002-4144-1000Graduate School of Automotive Engineering, Seoul National University of Science and Technology, Seoul, South KoreaGraduate School of Automotive Engineering, Seoul National University of Science and Technology, Seoul, South KoreaGraduate School of Automotive Engineering, Seoul National University of Science and Technology, Seoul, South KoreaDepartment of Mechanical and Automotive Engineering, Seoul National University of Science and Technology, Seoul, South KoreaIn this paper, we propose a method to improve image classification performance using the fusion of CNN and transformer structure. In the case of CNN, information about a local area on an image can be extracted well, but global information extraction is limited. On the other hand, the transformer has an advantage in global information extraction, but it requires much memory compared to CNN. We apply CNN on an image and consider the feature vector of each pixel on the resulting feature map by CNN as a token. At the same time, the image is divided into patches, and each patch is considered a token, like a transformer. Tokens by CNN and transformer have advantages in extracting local and global information, respectively. We assume that the combination of these two types of tokens will have an improved characteristic, and we show it through experiments. We propose three methods to fuse tokens having different characteristics: (1) late token fusion with parallel structure, (2) early token fusion (3) token fusion in layer-by-layer. The proposed method shows the best classification performance in experiments using ImageNet-1K.https://ieeexplore.ieee.org/document/10171338/Image classificationtransformerconvolutional neural networksdeep learning |
spellingShingle | Keong-Hun Choi Jin-Woo Kim Yao Wang Jong-Eun Ha Improved Image Classification With Token Fusion IEEE Access Image classification transformer convolutional neural networks deep learning |
title | Improved Image Classification With Token Fusion |
title_full | Improved Image Classification With Token Fusion |
title_fullStr | Improved Image Classification With Token Fusion |
title_full_unstemmed | Improved Image Classification With Token Fusion |
title_short | Improved Image Classification With Token Fusion |
title_sort | improved image classification with token fusion |
topic | Image classification transformer convolutional neural networks deep learning |
url | https://ieeexplore.ieee.org/document/10171338/ |
work_keys_str_mv | AT keonghunchoi improvedimageclassificationwithtokenfusion AT jinwookim improvedimageclassificationwithtokenfusion AT yaowang improvedimageclassificationwithtokenfusion AT jongeunha improvedimageclassificationwithtokenfusion |