Cross-Parallel Transformer: Parallel ViT for Medical Image Segmentation

Medical image segmentation primarily utilizes a hybrid model consisting of a Convolutional Neural Network and sequential Transformers. The latter leverage multi-head self-attention mechanisms to achieve comprehensive global context modelling. However, despite their success in semantic segmentation,...

Full description

Bibliographic Details
Main Authors: Dong Wang, Zixiang Wang, Ling Chen, Hongfeng Xiao, Bo Yang
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/23/23/9488
Description
Summary:Medical image segmentation primarily utilizes a hybrid model consisting of a Convolutional Neural Network and sequential Transformers. The latter leverage multi-head self-attention mechanisms to achieve comprehensive global context modelling. However, despite their success in semantic segmentation, the feature extraction process is inefficient and demands more computational resources, which hinders the network’s robustness. To address this issue, this study presents two innovative methods: PTransUNet (PT model) and C-PTransUNet (C-PT model). The C-PT module refines the Vision Transformer by substituting a sequential design with a parallel one. This boosts the feature extraction capabilities of Multi-Head Self-Attention via self-correlated feature attention and channel feature interaction, while also streamlining the Feed-Forward Network to lower computational demands. On the Synapse public dataset, the PT and C-PT models demonstrate improvements in DSC accuracy by 0.87% and 3.25%, respectively, in comparison with the baseline model. As for the parameter count and FLOPs, the PT model aligns with the baseline model. In contrast, the C-PT model shows a decrease in parameter count by 29% and FLOPs by 21.4% relative to the baseline model. The proposed segmentation models in this study exhibit benefits in both accuracy and efficiency.
ISSN:1424-8220