Re-Introducing BN Into Transformers for Vision Tasks

In recent years, Transformer-based models have exhibited significant advancements over previous models in natural language processing and vision tasks. This powerful methodology has also been extended to the 3D point cloud domain, where it can mitigate the inherent difficulties posed by the irregula...

Full description

Bibliographic Details
Main Authors: Xue-Song Tang, Xian-Lin Xie
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10145769/
_version_ 1827924852921597952
author Xue-Song Tang
Xian-Lin Xie
author_facet Xue-Song Tang
Xian-Lin Xie
author_sort Xue-Song Tang
collection DOAJ
description In recent years, Transformer-based models have exhibited significant advancements over previous models in natural language processing and vision tasks. This powerful methodology has also been extended to the 3D point cloud domain, where it can mitigate the inherent difficulties posed by the irregular and disorderly nature of the point clouds. However, the attention mechanism within the Transformer presents challenges for utilizing Batch Normalization (BN), as statistical information cannot be extracted efficiently from the data set. Thus, this study proposes a novel residual structure, ResBN, which can effectively handle 3D data. Additionally, to replace BN in the transformer for 2D image processing, we introduce the Patch Normalization (PN) technique. ResBN and PN are evaluated on 3D point cloud and 2D image datasets respectively through statistical experiments, demonstrating their efficacy in enhancing classification performance.
first_indexed 2024-03-13T05:16:45Z
format Article
id doaj.art-ff72bdaedd7841f7ae35256bc49d432d
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-13T05:16:45Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-ff72bdaedd7841f7ae35256bc49d432d2023-06-15T23:00:49ZengIEEEIEEE Access2169-35362023-01-0111584625846910.1109/ACCESS.2023.328361210145769Re-Introducing BN Into Transformers for Vision TasksXue-Song Tang0https://orcid.org/0000-0002-7594-2241Xian-Lin Xie1College of Information Science and Technology, Donghua University, Shanghai, ChinaCollege of Information Science and Technology, Donghua University, Shanghai, ChinaIn recent years, Transformer-based models have exhibited significant advancements over previous models in natural language processing and vision tasks. This powerful methodology has also been extended to the 3D point cloud domain, where it can mitigate the inherent difficulties posed by the irregular and disorderly nature of the point clouds. However, the attention mechanism within the Transformer presents challenges for utilizing Batch Normalization (BN), as statistical information cannot be extracted efficiently from the data set. Thus, this study proposes a novel residual structure, ResBN, which can effectively handle 3D data. Additionally, to replace BN in the transformer for 2D image processing, we introduce the Patch Normalization (PN) technique. ResBN and PN are evaluated on 3D point cloud and 2D image datasets respectively through statistical experiments, demonstrating their efficacy in enhancing classification performance.https://ieeexplore.ieee.org/document/10145769/Point cloudsnormalizationtransformers3D feature extraction
spellingShingle Xue-Song Tang
Xian-Lin Xie
Re-Introducing BN Into Transformers for Vision Tasks
IEEE Access
Point clouds
normalization
transformers
3D feature extraction
title Re-Introducing BN Into Transformers for Vision Tasks
title_full Re-Introducing BN Into Transformers for Vision Tasks
title_fullStr Re-Introducing BN Into Transformers for Vision Tasks
title_full_unstemmed Re-Introducing BN Into Transformers for Vision Tasks
title_short Re-Introducing BN Into Transformers for Vision Tasks
title_sort re introducing bn into transformers for vision tasks
topic Point clouds
normalization
transformers
3D feature extraction
url https://ieeexplore.ieee.org/document/10145769/
work_keys_str_mv AT xuesongtang reintroducingbnintotransformersforvisiontasks
AT xianlinxie reintroducingbnintotransformersforvisiontasks