Comparative analysis of vision transformers and conventional convolutional neural networks in detecting referable diabetic retinopathy

Objective: Vision transformers (ViTs) have shown promising performance in various classification tasks previously dominated by convolutional neural networks (CNNs). However, the performance of ViTs in referable diabetic retinopathy (DR) detection is relatively underexplored. In this study, using ret...

Full description

Bibliographic Details
Main Authors: Goh, Jocelyn Hui Lin, Ang, Elroy, Srinivasan, Sahana, Lei, Xiaofeng, Loh, Johnathan, Quek, Ten Cheer, Xue, Cancan, Xu, Xinxing, Liu, Yong, Cheng, Ching-Yu, Rajapakse, Jagath Chandana, Tham, Yih-Chung
Other Authors: School of Computer Science and Engineering
Format: Journal Article
Language:English
Published: 2024
Subjects:
Online Access:https://hdl.handle.net/10356/180451
_version_ 1826116521419603968
author Goh, Jocelyn Hui Lin
Ang, Elroy
Srinivasan, Sahana
Lei, Xiaofeng
Loh, Johnathan
Quek, Ten Cheer
Xue, Cancan
Xu, Xinxing
Liu, Yong
Cheng, Ching-Yu
Rajapakse, Jagath Chandana
Tham, Yih-Chung
author2 School of Computer Science and Engineering
author_facet School of Computer Science and Engineering
Goh, Jocelyn Hui Lin
Ang, Elroy
Srinivasan, Sahana
Lei, Xiaofeng
Loh, Johnathan
Quek, Ten Cheer
Xue, Cancan
Xu, Xinxing
Liu, Yong
Cheng, Ching-Yu
Rajapakse, Jagath Chandana
Tham, Yih-Chung
author_sort Goh, Jocelyn Hui Lin
collection NTU
description Objective: Vision transformers (ViTs) have shown promising performance in various classification tasks previously dominated by convolutional neural networks (CNNs). However, the performance of ViTs in referable diabetic retinopathy (DR) detection is relatively underexplored. In this study, using retinal photographs, we evaluated the comparative performances of ViTs and CNNs on detection of referable DR. Design: Retrospective study. Participants: A total of 48 269 retinal images from the open-source Kaggle DR detection dataset, the Messidor-1 dataset and the Singapore Epidemiology of Eye Diseases (SEED) study were included. Methods: Using 41 614 retinal photographs from the Kaggle dataset, we developed 5 CNN (Visual Geometry Group 19, ResNet50, InceptionV3, DenseNet201, and EfficientNetV2S) and 4 ViTs models (VAN_small, CrossViT_small, ViT_small, and Hierarchical Vision transformer using Shifted Windows [SWIN]_tiny) for the detection of referable DR. We defined the presence of referable DR as eyes with moderate or worse DR. The comparative performance of all 9 models was evaluated in the Kaggle internal test dataset (with 1045 study eyes), and in 2 external test sets, the SEED study (5455 study eyes) and the Messidor-1 (1200 study eyes). Main Outcome Measures: Area under operating characteristics curve (AUC), specificity, and sensitivity. Results: Among all models, the SWIN transformer displayed the highest AUC of 95.7% on the internal test set, significantly outperforming the CNN models (all P < 0.001). The same observation was confirmed in the external test sets, with the SWIN transformer achieving AUC of 97.3% in SEED and 96.3% in Messidor-1. When specificity level was fixed at 80% for the internal test, the SWIN transformer achieved the highest sensitivity of 94.4%, significantly better than all the CNN models (sensitivity levels ranging between 76.3% and 83.8%; all P < 0.001). This trend was also consistently observed in both external test sets. Conclusions: Our findings demonstrate that ViTs provide superior performance over CNNs in detecting referable DR from retinal photographs. These results point to the potential of utilizing ViT models to improve and optimize retinal photo-based deep learning for referable DR detection.
first_indexed 2025-03-09T11:40:25Z
format Journal Article
id ntu-10356/180451
institution Nanyang Technological University
language English
last_indexed 2025-03-09T11:40:25Z
publishDate 2024
record_format dspace
spelling ntu-10356/1804512024-10-11T15:36:38Z Comparative analysis of vision transformers and conventional convolutional neural networks in detecting referable diabetic retinopathy Goh, Jocelyn Hui Lin Ang, Elroy Srinivasan, Sahana Lei, Xiaofeng Loh, Johnathan Quek, Ten Cheer Xue, Cancan Xu, Xinxing Liu, Yong Cheng, Ching-Yu Rajapakse, Jagath Chandana Tham, Yih-Chung School of Computer Science and Engineering Computer and Information Science Convolutional neural network Referable diabetic retinopathy Objective: Vision transformers (ViTs) have shown promising performance in various classification tasks previously dominated by convolutional neural networks (CNNs). However, the performance of ViTs in referable diabetic retinopathy (DR) detection is relatively underexplored. In this study, using retinal photographs, we evaluated the comparative performances of ViTs and CNNs on detection of referable DR. Design: Retrospective study. Participants: A total of 48 269 retinal images from the open-source Kaggle DR detection dataset, the Messidor-1 dataset and the Singapore Epidemiology of Eye Diseases (SEED) study were included. Methods: Using 41 614 retinal photographs from the Kaggle dataset, we developed 5 CNN (Visual Geometry Group 19, ResNet50, InceptionV3, DenseNet201, and EfficientNetV2S) and 4 ViTs models (VAN_small, CrossViT_small, ViT_small, and Hierarchical Vision transformer using Shifted Windows [SWIN]_tiny) for the detection of referable DR. We defined the presence of referable DR as eyes with moderate or worse DR. The comparative performance of all 9 models was evaluated in the Kaggle internal test dataset (with 1045 study eyes), and in 2 external test sets, the SEED study (5455 study eyes) and the Messidor-1 (1200 study eyes). Main Outcome Measures: Area under operating characteristics curve (AUC), specificity, and sensitivity. Results: Among all models, the SWIN transformer displayed the highest AUC of 95.7% on the internal test set, significantly outperforming the CNN models (all P < 0.001). The same observation was confirmed in the external test sets, with the SWIN transformer achieving AUC of 97.3% in SEED and 96.3% in Messidor-1. When specificity level was fixed at 80% for the internal test, the SWIN transformer achieved the highest sensitivity of 94.4%, significantly better than all the CNN models (sensitivity levels ranging between 76.3% and 83.8%; all P < 0.001). This trend was also consistently observed in both external test sets. Conclusions: Our findings demonstrate that ViTs provide superior performance over CNNs in detecting referable DR from retinal photographs. These results point to the potential of utilizing ViT models to improve and optimize retinal photo-based deep learning for referable DR detection. Agency for Science, Technology and Research (A*STAR) Published version Y.C.T. is supported by the National Medical Research Council’s HPHSR Clinician Scientist Award (NMRC/MOH/HCSAINV21nov-0001). The sponsor or funding organization had no role in the design or conduct of this research. This project is supported by the Agency for Science, Technology and Research (A*STAR) under its RIE2020 Health and Biomedical Sciences (HBMS) Industry Alignment Fund Pre-Positioning (IAF-PP) grant no. H20c6a0031. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not reflect the views of the A*STAR. 2024-10-08T01:03:27Z 2024-10-08T01:03:27Z 2024 Journal Article Goh, J. H. L., Ang, E., Srinivasan, S., Lei, X., Loh, J., Quek, T. C., Xue, C., Xu, X., Liu, Y., Cheng, C., Rajapakse, J. C. & Tham, Y. (2024). Comparative analysis of vision transformers and conventional convolutional neural networks in detecting referable diabetic retinopathy. Ophthalmology Science, 4(6), 100552-. https://dx.doi.org/10.1016/j.xops.2024.100552 2666-9145 https://hdl.handle.net/10356/180451 10.1016/j.xops.2024.100552 2-s2.0-85199774384 6 4 100552 en H20c6a0031 Ophthalmology Science © 2024 by the American Academy of Ophthalmology. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Published by Elsevier Inc. application/pdf
spellingShingle Computer and Information Science
Convolutional neural network
Referable diabetic retinopathy
Goh, Jocelyn Hui Lin
Ang, Elroy
Srinivasan, Sahana
Lei, Xiaofeng
Loh, Johnathan
Quek, Ten Cheer
Xue, Cancan
Xu, Xinxing
Liu, Yong
Cheng, Ching-Yu
Rajapakse, Jagath Chandana
Tham, Yih-Chung
Comparative analysis of vision transformers and conventional convolutional neural networks in detecting referable diabetic retinopathy
title Comparative analysis of vision transformers and conventional convolutional neural networks in detecting referable diabetic retinopathy
title_full Comparative analysis of vision transformers and conventional convolutional neural networks in detecting referable diabetic retinopathy
title_fullStr Comparative analysis of vision transformers and conventional convolutional neural networks in detecting referable diabetic retinopathy
title_full_unstemmed Comparative analysis of vision transformers and conventional convolutional neural networks in detecting referable diabetic retinopathy
title_short Comparative analysis of vision transformers and conventional convolutional neural networks in detecting referable diabetic retinopathy
title_sort comparative analysis of vision transformers and conventional convolutional neural networks in detecting referable diabetic retinopathy
topic Computer and Information Science
Convolutional neural network
Referable diabetic retinopathy
url https://hdl.handle.net/10356/180451
work_keys_str_mv AT gohjocelynhuilin comparativeanalysisofvisiontransformersandconventionalconvolutionalneuralnetworksindetectingreferablediabeticretinopathy
AT angelroy comparativeanalysisofvisiontransformersandconventionalconvolutionalneuralnetworksindetectingreferablediabeticretinopathy
AT srinivasansahana comparativeanalysisofvisiontransformersandconventionalconvolutionalneuralnetworksindetectingreferablediabeticretinopathy
AT leixiaofeng comparativeanalysisofvisiontransformersandconventionalconvolutionalneuralnetworksindetectingreferablediabeticretinopathy
AT lohjohnathan comparativeanalysisofvisiontransformersandconventionalconvolutionalneuralnetworksindetectingreferablediabeticretinopathy
AT quektencheer comparativeanalysisofvisiontransformersandconventionalconvolutionalneuralnetworksindetectingreferablediabeticretinopathy
AT xuecancan comparativeanalysisofvisiontransformersandconventionalconvolutionalneuralnetworksindetectingreferablediabeticretinopathy
AT xuxinxing comparativeanalysisofvisiontransformersandconventionalconvolutionalneuralnetworksindetectingreferablediabeticretinopathy
AT liuyong comparativeanalysisofvisiontransformersandconventionalconvolutionalneuralnetworksindetectingreferablediabeticretinopathy
AT chengchingyu comparativeanalysisofvisiontransformersandconventionalconvolutionalneuralnetworksindetectingreferablediabeticretinopathy
AT rajapaksejagathchandana comparativeanalysisofvisiontransformersandconventionalconvolutionalneuralnetworksindetectingreferablediabeticretinopathy
AT thamyihchung comparativeanalysisofvisiontransformersandconventionalconvolutionalneuralnetworksindetectingreferablediabeticretinopathy