TS-CNN: A Three-Tier Self-Interpretable CNN for Multi-Region Medical Image Classification

Medical image classification is critical, where reliability and transparency are crucial for the safe and accurate diagnosis of diseases. Deep Convolutional Neural Networks (DCNNs) are widely used in medical image classification due to their high performance. However, they are often considered black...

Full description

Bibliographic Details
Main Authors: V. A. Ashwath, O. K. Sikha, Raul Benitez
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10197361/
Description
Summary:Medical image classification is critical, where reliability and transparency are crucial for the safe and accurate diagnosis of diseases. Deep Convolutional Neural Networks (DCNNs) are widely used in medical image classification due to their high performance. However, they are often considered black-boxes because they offer little insight into decision-making. Therefore, improving the interpretability of DCNNs is crucial for their adoption in medical diagnoses. This paper proposes a novel three-tier self-interpretable DCNN (TS-CNN) architecture for multi-region medical image classification, which improves classification performance while being inherently interpretable. The proposed TS-CNN architecture is well-suited for medical images with multiple regions, such as images with scattered and randomly shaped lesions. The proposed architecture has three branches: a global branch that learns the relevant patterns from the raw input image; an attention branch that selects the important regions and discards the irrelevant parts for the local branch to learn; and a fusion branch that distills knowledge from both the global and local branches for classification. The proposed architecture is flexible in terms of the backbone CNNs used for classification and post-hoc interpretability methods used for attention capture. We demonstrate the flexibility and generalization of the architecture through a series of experiments involving multiple state-of-the-art CNN architectures such as DenseNet-121, Inception, Xception, and ResNet-50 as the global/local branches, each paired with GradCAM and Saliency maps as attention modules. The proposed architecture outperformed the backbone model in classification tasks on two datasets: a custom-made blob dataset and a publicly available skin lesion PAD-UFES-20 dataset, demonstrating its potential for improving accuracy in medical image classification tasks. The code related to this work can be found at: <uri>https://github.com/sikha2552/TS-CNN-A-Three-Tier-Self-Interpretable-CNN-for-Medical-Image-Classification-Empowered-with-Post-hoc.git</uri>.
ISSN:2169-3536