Detecting Tuberculosis-Consistent Findings in Lateral Chest X-Rays Using an Ensemble of CNNs and Vision Transformers

Research on detecting Tuberculosis (TB) findings on chest radiographs (or Chest X-rays: CXR) using convolutional neural networks (CNNs) has demonstrated superior performance due to the emergence of publicly available, large-scale datasets with expert annotations and availability of scalable computat...

Full description

Bibliographic Details
Main Authors: Sivaramakrishnan Rajaraman, Ghada Zamzmi, Les R. Folio, Sameer Antani
Format: Article
Language:English
Published: Frontiers Media S.A. 2022-02-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fgene.2022.864724/full
_version_ 1818872359507984384
author Sivaramakrishnan Rajaraman
Ghada Zamzmi
Les R. Folio
Sameer Antani
author_facet Sivaramakrishnan Rajaraman
Ghada Zamzmi
Les R. Folio
Sameer Antani
author_sort Sivaramakrishnan Rajaraman
collection DOAJ
description Research on detecting Tuberculosis (TB) findings on chest radiographs (or Chest X-rays: CXR) using convolutional neural networks (CNNs) has demonstrated superior performance due to the emergence of publicly available, large-scale datasets with expert annotations and availability of scalable computational resources. However, these studies use only the frontal CXR projections, i.e., the posterior-anterior (PA), and the anterior-posterior (AP) views for analysis and decision-making. Lateral CXRs which are heretofore not studied help detect clinically suspected pulmonary TB, particularly in children. Further, Vision Transformers (ViTs) with built-in self-attention mechanisms have recently emerged as a viable alternative to the traditional CNNs. Although ViTs demonstrated notable performance in several medical image analysis tasks, potential limitations exist in terms of performance and computational efficiency, between the CNN and ViT models, necessitating a comprehensive analysis to select appropriate models for the problem under study. This study aims to detect TB-consistent findings in lateral CXRs by constructing an ensemble of the CNN and ViT models. Several models are trained on lateral CXR data extracted from two large public collections to transfer modality-specific knowledge and fine-tune them for detecting findings consistent with TB. We observed that the weighted averaging ensemble of the predictions of CNN and ViT models using the optimal weights computed with the Sequential Least-Squares Quadratic Programming method delivered significantly superior performance (MCC: 0.8136, 95% confidence intervals (CI): 0.7394, 0.8878, p < 0.05) compared to the individual models and other ensembles. We also interpreted the decisions of CNN and ViT models using class-selective relevance maps and attention maps, respectively, and combined them to highlight the discriminative image regions contributing to the final output. We observed that (i) the model accuracy is not related to disease region of interest (ROI) localization and (ii) the bitwise-AND of the heatmaps of the top-2-performing models delivered significantly superior ROI localization performance in terms of mean average precision [mAP@(0.1 0.6) = 0.1820, 95% CI: 0.0771,0.2869, p < 0.05], compared to other individual models and ensembles. The code is available at https://github.com/sivaramakrishnan-rajaraman/Ensemble-of-CNN-and-ViT-for-TB-detection-in-lateral-CXR.
first_indexed 2024-12-19T12:37:34Z
format Article
id doaj.art-c8e6637e4c8f48839d7eafbdaf28aae4
institution Directory Open Access Journal
issn 1664-8021
language English
last_indexed 2024-12-19T12:37:34Z
publishDate 2022-02-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Genetics
spelling doaj.art-c8e6637e4c8f48839d7eafbdaf28aae42022-12-21T20:21:06ZengFrontiers Media S.A.Frontiers in Genetics1664-80212022-02-011310.3389/fgene.2022.864724864724Detecting Tuberculosis-Consistent Findings in Lateral Chest X-Rays Using an Ensemble of CNNs and Vision TransformersSivaramakrishnan Rajaraman0Ghada Zamzmi1Les R. Folio2Sameer Antani3Computational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD, United StatesComputational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD, United StatesMoffitt Cancer Center, Tampa, FL, United StatesComputational Health Research Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD, United StatesResearch on detecting Tuberculosis (TB) findings on chest radiographs (or Chest X-rays: CXR) using convolutional neural networks (CNNs) has demonstrated superior performance due to the emergence of publicly available, large-scale datasets with expert annotations and availability of scalable computational resources. However, these studies use only the frontal CXR projections, i.e., the posterior-anterior (PA), and the anterior-posterior (AP) views for analysis and decision-making. Lateral CXRs which are heretofore not studied help detect clinically suspected pulmonary TB, particularly in children. Further, Vision Transformers (ViTs) with built-in self-attention mechanisms have recently emerged as a viable alternative to the traditional CNNs. Although ViTs demonstrated notable performance in several medical image analysis tasks, potential limitations exist in terms of performance and computational efficiency, between the CNN and ViT models, necessitating a comprehensive analysis to select appropriate models for the problem under study. This study aims to detect TB-consistent findings in lateral CXRs by constructing an ensemble of the CNN and ViT models. Several models are trained on lateral CXR data extracted from two large public collections to transfer modality-specific knowledge and fine-tune them for detecting findings consistent with TB. We observed that the weighted averaging ensemble of the predictions of CNN and ViT models using the optimal weights computed with the Sequential Least-Squares Quadratic Programming method delivered significantly superior performance (MCC: 0.8136, 95% confidence intervals (CI): 0.7394, 0.8878, p < 0.05) compared to the individual models and other ensembles. We also interpreted the decisions of CNN and ViT models using class-selective relevance maps and attention maps, respectively, and combined them to highlight the discriminative image regions contributing to the final output. We observed that (i) the model accuracy is not related to disease region of interest (ROI) localization and (ii) the bitwise-AND of the heatmaps of the top-2-performing models delivered significantly superior ROI localization performance in terms of mean average precision [mAP@(0.1 0.6) = 0.1820, 95% CI: 0.0771,0.2869, p < 0.05], compared to other individual models and ensembles. The code is available at https://github.com/sivaramakrishnan-rajaraman/Ensemble-of-CNN-and-ViT-for-TB-detection-in-lateral-CXR.https://www.frontiersin.org/articles/10.3389/fgene.2022.864724/fullchest radiographsCNNdeep learningtuberculosis classification and localizationvision transformersensemble learning
spellingShingle Sivaramakrishnan Rajaraman
Ghada Zamzmi
Les R. Folio
Sameer Antani
Detecting Tuberculosis-Consistent Findings in Lateral Chest X-Rays Using an Ensemble of CNNs and Vision Transformers
Frontiers in Genetics
chest radiographs
CNN
deep learning
tuberculosis classification and localization
vision transformers
ensemble learning
title Detecting Tuberculosis-Consistent Findings in Lateral Chest X-Rays Using an Ensemble of CNNs and Vision Transformers
title_full Detecting Tuberculosis-Consistent Findings in Lateral Chest X-Rays Using an Ensemble of CNNs and Vision Transformers
title_fullStr Detecting Tuberculosis-Consistent Findings in Lateral Chest X-Rays Using an Ensemble of CNNs and Vision Transformers
title_full_unstemmed Detecting Tuberculosis-Consistent Findings in Lateral Chest X-Rays Using an Ensemble of CNNs and Vision Transformers
title_short Detecting Tuberculosis-Consistent Findings in Lateral Chest X-Rays Using an Ensemble of CNNs and Vision Transformers
title_sort detecting tuberculosis consistent findings in lateral chest x rays using an ensemble of cnns and vision transformers
topic chest radiographs
CNN
deep learning
tuberculosis classification and localization
vision transformers
ensemble learning
url https://www.frontiersin.org/articles/10.3389/fgene.2022.864724/full
work_keys_str_mv AT sivaramakrishnanrajaraman detectingtuberculosisconsistentfindingsinlateralchestxraysusinganensembleofcnnsandvisiontransformers
AT ghadazamzmi detectingtuberculosisconsistentfindingsinlateralchestxraysusinganensembleofcnnsandvisiontransformers
AT lesrfolio detectingtuberculosisconsistentfindingsinlateralchestxraysusinganensembleofcnnsandvisiontransformers
AT sameerantani detectingtuberculosisconsistentfindingsinlateralchestxraysusinganensembleofcnnsandvisiontransformers