Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs

Glaucomatous optic neuropathy (GON) can be diagnosed and monitored using fundus photography, a widely available and low-cost approach already adopted for automated screening of ophthalmic diseases such as diabetic retinopathy. Despite this, the lack of validated early screening approaches remains a...

Full description

Bibliographic Details
Main Authors:	Elizabeth E. Hwang, Dake Chen, Ying Han, Lin Jia, Jing Shan
Format:	Article
Language:	English
Published:	MDPI AG 2023-10-01
Series:	Bioengineering
Subjects:	glaucoma deep learning vision transformer fundus photography
Online Access:	https://www.mdpi.com/2306-5354/10/11/1266

_version_	1797460097417347072
author	Elizabeth E. Hwang Dake Chen Ying Han Lin Jia Jing Shan
author_facet	Elizabeth E. Hwang Dake Chen Ying Han Lin Jia Jing Shan
author_sort	Elizabeth E. Hwang
collection	DOAJ
description	Glaucomatous optic neuropathy (GON) can be diagnosed and monitored using fundus photography, a widely available and low-cost approach already adopted for automated screening of ophthalmic diseases such as diabetic retinopathy. Despite this, the lack of validated early screening approaches remains a major obstacle in the prevention of glaucoma-related blindness. Deep learning models have gained significant interest as potential solutions, as these models offer objective and high-throughput methods for processing image-based medical data. While convolutional neural networks (CNN) have been widely utilized for these purposes, more recent advances in the application of Transformer architectures have led to new models, including Vision Transformer (ViT,) that have shown promise in many domains of image analysis. However, previous comparisons of these two architectures have not sufficiently compared models side-by-side with more than a single dataset, making it unclear which model is more generalizable or performs better in different clinical contexts. Our purpose is to investigate comparable ViT and CNN models tasked with GON detection from fundus photos and highlight their respective strengths and weaknesses. We train CNN and ViT models on six unrelated, publicly available databases and compare their performance using well-established statistics including AUC, sensitivity, and specificity. Our results indicate that ViT models often show superior performance when compared with a similarly trained CNN model, particularly when non-glaucomatous images are over-represented in a given dataset. We discuss the clinical implications of these findings and suggest that ViT can further the development of accurate and scalable GON detection for this leading cause of irreversible blindness worldwide.
first_indexed	2024-03-09T17:01:18Z
format	Article
id	doaj.art-224b8df9a7474e978588d2d004b35880
institution	Directory Open Access Journal
issn	2306-5354
language	English
last_indexed	2024-03-09T17:01:18Z
publishDate	2023-10-01
publisher	MDPI AG
record_format	Article
series	Bioengineering
spelling	doaj.art-224b8df9a7474e978588d2d004b358802023-11-24T14:29:44ZengMDPI AGBioengineering2306-53542023-10-011011126610.3390/bioengineering10111266Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus PhotographsElizabeth E. Hwang0Dake Chen1Ying Han2Lin Jia3Jing Shan4Department of Ophthalmology, University of California, San Francisco, San Francisco, CA 94143, USADepartment of Ophthalmology, University of California, San Francisco, San Francisco, CA 94143, USADepartment of Ophthalmology, University of California, San Francisco, San Francisco, CA 94143, USADigillect LLC, San Francisco, CA 94158, USADepartment of Ophthalmology, University of California, San Francisco, San Francisco, CA 94143, USAGlaucomatous optic neuropathy (GON) can be diagnosed and monitored using fundus photography, a widely available and low-cost approach already adopted for automated screening of ophthalmic diseases such as diabetic retinopathy. Despite this, the lack of validated early screening approaches remains a major obstacle in the prevention of glaucoma-related blindness. Deep learning models have gained significant interest as potential solutions, as these models offer objective and high-throughput methods for processing image-based medical data. While convolutional neural networks (CNN) have been widely utilized for these purposes, more recent advances in the application of Transformer architectures have led to new models, including Vision Transformer (ViT,) that have shown promise in many domains of image analysis. However, previous comparisons of these two architectures have not sufficiently compared models side-by-side with more than a single dataset, making it unclear which model is more generalizable or performs better in different clinical contexts. Our purpose is to investigate comparable ViT and CNN models tasked with GON detection from fundus photos and highlight their respective strengths and weaknesses. We train CNN and ViT models on six unrelated, publicly available databases and compare their performance using well-established statistics including AUC, sensitivity, and specificity. Our results indicate that ViT models often show superior performance when compared with a similarly trained CNN model, particularly when non-glaucomatous images are over-represented in a given dataset. We discuss the clinical implications of these findings and suggest that ViT can further the development of accurate and scalable GON detection for this leading cause of irreversible blindness worldwide.https://www.mdpi.com/2306-5354/10/11/1266glaucomadeep learningvision transformerfundus photography
spellingShingle	Elizabeth E. Hwang Dake Chen Ying Han Lin Jia Jing Shan Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs Bioengineering glaucoma deep learning vision transformer fundus photography
title	Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs
title_full	Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs
title_fullStr	Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs
title_full_unstemmed	Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs
title_short	Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs
title_sort	multi dataset comparison of vision transformers and convolutional neural networks for detecting glaucomatous optic neuropathy from fundus photographs
topic	glaucoma deep learning vision transformer fundus photography
url	https://www.mdpi.com/2306-5354/10/11/1266
work_keys_str_mv	AT elizabethehwang multidatasetcomparisonofvisiontransformersandconvolutionalneuralnetworksfordetectingglaucomatousopticneuropathyfromfundusphotographs AT dakechen multidatasetcomparisonofvisiontransformersandconvolutionalneuralnetworksfordetectingglaucomatousopticneuropathyfromfundusphotographs AT yinghan multidatasetcomparisonofvisiontransformersandconvolutionalneuralnetworksfordetectingglaucomatousopticneuropathyfromfundusphotographs AT linjia multidatasetcomparisonofvisiontransformersandconvolutionalneuralnetworksfordetectingglaucomatousopticneuropathyfromfundusphotographs AT jingshan multidatasetcomparisonofvisiontransformersandconvolutionalneuralnetworksfordetectingglaucomatousopticneuropathyfromfundusphotographs

Multi-Dataset Comparison of Vision Transformers and Convolutional Neural Networks for Detecting Glaucomatous Optic Neuropathy from Fundus Photographs

Similar Items