Efficient Scopeformer: Toward Scalable and Rich Feature Extraction for Intracranial Hemorrhage Detection

The quality and richness of feature maps extracted by convolution neural networks (CNNs) and vision Transformers (ViTs) directly relate to the robust model performance. In medical computer vision, these information-rich features are crucial for detecting rare cases within large datasets. This work p...

Full description

Bibliographic Details
Main Authors:	Yassine Barhoumi, Nidhal Carla Bouaynaya, Ghulam Rasool
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Computed tomography (CT) intracranial hemorrhage medical imaging convolutional neural networks vision transformers feature maps
Online Access:	https://ieeexplore.ieee.org/document/10201867/

_version_	1797742991473901568
author	Yassine Barhoumi Nidhal Carla Bouaynaya Ghulam Rasool
author_facet	Yassine Barhoumi Nidhal Carla Bouaynaya Ghulam Rasool
author_sort	Yassine Barhoumi
collection	DOAJ
description	The quality and richness of feature maps extracted by convolution neural networks (CNNs) and vision Transformers (ViTs) directly relate to the robust model performance. In medical computer vision, these information-rich features are crucial for detecting rare cases within large datasets. This work presents the “Scopeformer,” a novel multi-CNN-ViT model for intracranial hemorrhage classification in computed tomography (CT) images. The Scopeformer architecture is scalable and modular, which allows utilizing various CNN architectures as the backbone with diversified output features and pre-training strategies. We propose effective feature projection methods to reduce redundancies among CNN-generated features and to control the input size of ViTs. Extensive experiments with various Scopeformer models show that the model performance is proportional to the number of convolutional blocks employed in the feature extractor. Using multiple strategies, including diversifying the pre-training paradigms for CNNs, different pre-training datasets, and style transfer techniques, we demonstrate an overall improvement in the model performance at various computational budgets. Later, we propose smaller compute-efficient Scopeformer versions with three different types of input and output ViT configurations. Efficient Scopeformers use four different pre-trained CNN architectures as feature extractors to increase feature richness. Our best Efficient Scopeformer model achieved an accuracy of 96.94% and a weighted logarithmic loss of 0.083 with an eight times reduction in the number of trainable parameters compared to the base Scopeformer. Another version of the Efficient Scopeformer model further reduced the parameter space by almost 17 times with negligible performance reduction. In summary, our work showed that the hybrid architectures consisting of CNNs and ViTs might provide the desired feature richness for developing accurate medical computer vision models.
first_indexed	2024-03-12T14:48:09Z
format	Article
id	doaj.art-bff6ace16fb9498787579ecce5576847
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-12T14:48:09Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-bff6ace16fb9498787579ecce55768472023-08-15T23:00:57ZengIEEEIEEE Access2169-35362023-01-0111816568167110.1109/ACCESS.2023.330116010201867Efficient Scopeformer: Toward Scalable and Rich Feature Extraction for Intracranial Hemorrhage DetectionYassine Barhoumi0https://orcid.org/0000-0003-4451-5930Nidhal Carla Bouaynaya1https://orcid.org/0000-0002-8833-8414Ghulam Rasool2https://orcid.org/0000-0001-8551-0090Electrical and Computer Science Department, Rowan University, Glassboro, NJ, USAElectrical and Computer Science Department, Rowan University, Glassboro, NJ, USADepartment of Machine Learning, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USAThe quality and richness of feature maps extracted by convolution neural networks (CNNs) and vision Transformers (ViTs) directly relate to the robust model performance. In medical computer vision, these information-rich features are crucial for detecting rare cases within large datasets. This work presents the “Scopeformer,” a novel multi-CNN-ViT model for intracranial hemorrhage classification in computed tomography (CT) images. The Scopeformer architecture is scalable and modular, which allows utilizing various CNN architectures as the backbone with diversified output features and pre-training strategies. We propose effective feature projection methods to reduce redundancies among CNN-generated features and to control the input size of ViTs. Extensive experiments with various Scopeformer models show that the model performance is proportional to the number of convolutional blocks employed in the feature extractor. Using multiple strategies, including diversifying the pre-training paradigms for CNNs, different pre-training datasets, and style transfer techniques, we demonstrate an overall improvement in the model performance at various computational budgets. Later, we propose smaller compute-efficient Scopeformer versions with three different types of input and output ViT configurations. Efficient Scopeformers use four different pre-trained CNN architectures as feature extractors to increase feature richness. Our best Efficient Scopeformer model achieved an accuracy of 96.94% and a weighted logarithmic loss of 0.083 with an eight times reduction in the number of trainable parameters compared to the base Scopeformer. Another version of the Efficient Scopeformer model further reduced the parameter space by almost 17 times with negligible performance reduction. In summary, our work showed that the hybrid architectures consisting of CNNs and ViTs might provide the desired feature richness for developing accurate medical computer vision models.https://ieeexplore.ieee.org/document/10201867/Computed tomography (CT)intracranial hemorrhagemedical imagingconvolutional neural networksvision transformersfeature maps
spellingShingle	Yassine Barhoumi Nidhal Carla Bouaynaya Ghulam Rasool Efficient Scopeformer: Toward Scalable and Rich Feature Extraction for Intracranial Hemorrhage Detection IEEE Access Computed tomography (CT) intracranial hemorrhage medical imaging convolutional neural networks vision transformers feature maps
title	Efficient Scopeformer: Toward Scalable and Rich Feature Extraction for Intracranial Hemorrhage Detection
title_full	Efficient Scopeformer: Toward Scalable and Rich Feature Extraction for Intracranial Hemorrhage Detection
title_fullStr	Efficient Scopeformer: Toward Scalable and Rich Feature Extraction for Intracranial Hemorrhage Detection
title_full_unstemmed	Efficient Scopeformer: Toward Scalable and Rich Feature Extraction for Intracranial Hemorrhage Detection
title_short	Efficient Scopeformer: Toward Scalable and Rich Feature Extraction for Intracranial Hemorrhage Detection
title_sort	efficient scopeformer toward scalable and rich feature extraction for intracranial hemorrhage detection
topic	Computed tomography (CT) intracranial hemorrhage medical imaging convolutional neural networks vision transformers feature maps
url	https://ieeexplore.ieee.org/document/10201867/
work_keys_str_mv	AT yassinebarhoumi efficientscopeformertowardscalableandrichfeatureextractionforintracranialhemorrhagedetection AT nidhalcarlabouaynaya efficientscopeformertowardscalableandrichfeatureextractionforintracranialhemorrhagedetection AT ghulamrasool efficientscopeformertowardscalableandrichfeatureextractionforintracranialhemorrhagedetection

Efficient Scopeformer: Toward Scalable and Rich Feature Extraction for Intracranial Hemorrhage Detection

Similar Items