Technical and imaging factors influencing performance of deep learning systems for diabetic retinopathy
Abstract Deep learning (DL) has been shown to be effective in developing diabetic retinopathy (DR) algorithms, possibly tackling financial and manpower challenges hindering implementation of DR screening. However, our systematic review of the literature reveals few studies studied the impact of diff...
Main Authors: | , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2020-03-01
|
Series: | npj Digital Medicine |
Online Access: | https://doi.org/10.1038/s41746-020-0247-1 |
_version_ | 1827608584291090432 |
---|---|
author | Michelle Y. T. Yip Gilbert Lim Zhan Wei Lim Quang D. Nguyen Crystal C. Y. Chong Marco Yu Valentina Bellemo Yuchen Xie Xin Qi Lee Haslina Hamzah Jinyi Ho Tien-En Tan Charumathi Sabanayagam Andrzej Grzybowski Gavin S. W. Tan Wynne Hsu Mong Li Lee Tien Yin Wong Daniel S. W. Ting |
author_facet | Michelle Y. T. Yip Gilbert Lim Zhan Wei Lim Quang D. Nguyen Crystal C. Y. Chong Marco Yu Valentina Bellemo Yuchen Xie Xin Qi Lee Haslina Hamzah Jinyi Ho Tien-En Tan Charumathi Sabanayagam Andrzej Grzybowski Gavin S. W. Tan Wynne Hsu Mong Li Lee Tien Yin Wong Daniel S. W. Ting |
author_sort | Michelle Y. T. Yip |
collection | DOAJ |
description | Abstract Deep learning (DL) has been shown to be effective in developing diabetic retinopathy (DR) algorithms, possibly tackling financial and manpower challenges hindering implementation of DR screening. However, our systematic review of the literature reveals few studies studied the impact of different factors on these DL algorithms, that are important for clinical deployment in real-world settings. Using 455,491 retinal images, we evaluated two technical and three image-related factors in detection of referable DR. For technical factors, the performances of four DL models (VGGNet, ResNet, DenseNet, Ensemble) and two computational frameworks (Caffe, TensorFlow) were evaluated while for image-related factors, we evaluated image compression levels (reducing image size, 350, 300, 250, 200, 150 KB), number of fields (7-field, 2-field, 1-field) and media clarity (pseudophakic vs phakic). In detection of referable DR, four DL models showed comparable diagnostic performance (AUC 0.936-0.944). To develop the VGGNet model, two computational frameworks had similar AUC (0.936). The DL performance dropped when image size decreased below 250 KB (AUC 0.936, 0.900, p < 0.001). The DL performance performed better when there were increased number of fields (dataset 1: 2-field vs 1-field—AUC 0.936 vs 0.908, p < 0.001; dataset 2: 7-field vs 2-field vs 1-field, AUC 0.949 vs 0.911 vs 0.895). DL performed better in the pseudophakic than phakic eyes (AUC 0.918 vs 0.833, p < 0.001). Various image-related factors play more significant roles than technical factors in determining the diagnostic performance, suggesting the importance of having robust training and testing datasets for DL training and deployment in the real-world settings. |
first_indexed | 2024-03-09T07:15:08Z |
format | Article |
id | doaj.art-465788d7e8eb44a1b4e7db09b3dfd067 |
institution | Directory Open Access Journal |
issn | 2398-6352 |
language | English |
last_indexed | 2024-03-09T07:15:08Z |
publishDate | 2020-03-01 |
publisher | Nature Portfolio |
record_format | Article |
series | npj Digital Medicine |
spelling | doaj.art-465788d7e8eb44a1b4e7db09b3dfd0672023-12-03T08:33:29ZengNature Portfolionpj Digital Medicine2398-63522020-03-013111210.1038/s41746-020-0247-1Technical and imaging factors influencing performance of deep learning systems for diabetic retinopathyMichelle Y. T. Yip0Gilbert Lim1Zhan Wei Lim2Quang D. Nguyen3Crystal C. Y. Chong4Marco Yu5Valentina Bellemo6Yuchen Xie7Xin Qi Lee8Haslina Hamzah9Jinyi Ho10Tien-En Tan11Charumathi Sabanayagam12Andrzej Grzybowski13Gavin S. W. Tan14Wynne Hsu15Mong Li Lee16Tien Yin Wong17Daniel S. W. Ting18Singapore Eye Research Institute, Singapore National Eye CenterSingapore Eye Research Institute, Singapore National Eye CenterSchool of Computing, National University of SingaporeSingapore Eye Research Institute, Singapore National Eye CenterSingapore Eye Research Institute, Singapore National Eye CenterSingapore Eye Research Institute, Singapore National Eye CenterSingapore Eye Research Institute, Singapore National Eye CenterSingapore Eye Research Institute, Singapore National Eye CenterSingapore Eye Research Institute, Singapore National Eye CenterSingapore Eye Research Institute, Singapore National Eye CenterSingapore Eye Research Institute, Singapore National Eye CenterSingapore Eye Research Institute, Singapore National Eye CenterSingapore Eye Research Institute, Singapore National Eye CenterDepartment of Ophthalmology, University of Warmia and MazurySingapore Eye Research Institute, Singapore National Eye CenterSchool of Computing, National University of SingaporeSchool of Computing, National University of SingaporeSingapore Eye Research Institute, Singapore National Eye CenterSingapore Eye Research Institute, Singapore National Eye CenterAbstract Deep learning (DL) has been shown to be effective in developing diabetic retinopathy (DR) algorithms, possibly tackling financial and manpower challenges hindering implementation of DR screening. However, our systematic review of the literature reveals few studies studied the impact of different factors on these DL algorithms, that are important for clinical deployment in real-world settings. Using 455,491 retinal images, we evaluated two technical and three image-related factors in detection of referable DR. For technical factors, the performances of four DL models (VGGNet, ResNet, DenseNet, Ensemble) and two computational frameworks (Caffe, TensorFlow) were evaluated while for image-related factors, we evaluated image compression levels (reducing image size, 350, 300, 250, 200, 150 KB), number of fields (7-field, 2-field, 1-field) and media clarity (pseudophakic vs phakic). In detection of referable DR, four DL models showed comparable diagnostic performance (AUC 0.936-0.944). To develop the VGGNet model, two computational frameworks had similar AUC (0.936). The DL performance dropped when image size decreased below 250 KB (AUC 0.936, 0.900, p < 0.001). The DL performance performed better when there were increased number of fields (dataset 1: 2-field vs 1-field—AUC 0.936 vs 0.908, p < 0.001; dataset 2: 7-field vs 2-field vs 1-field, AUC 0.949 vs 0.911 vs 0.895). DL performed better in the pseudophakic than phakic eyes (AUC 0.918 vs 0.833, p < 0.001). Various image-related factors play more significant roles than technical factors in determining the diagnostic performance, suggesting the importance of having robust training and testing datasets for DL training and deployment in the real-world settings.https://doi.org/10.1038/s41746-020-0247-1 |
spellingShingle | Michelle Y. T. Yip Gilbert Lim Zhan Wei Lim Quang D. Nguyen Crystal C. Y. Chong Marco Yu Valentina Bellemo Yuchen Xie Xin Qi Lee Haslina Hamzah Jinyi Ho Tien-En Tan Charumathi Sabanayagam Andrzej Grzybowski Gavin S. W. Tan Wynne Hsu Mong Li Lee Tien Yin Wong Daniel S. W. Ting Technical and imaging factors influencing performance of deep learning systems for diabetic retinopathy npj Digital Medicine |
title | Technical and imaging factors influencing performance of deep learning systems for diabetic retinopathy |
title_full | Technical and imaging factors influencing performance of deep learning systems for diabetic retinopathy |
title_fullStr | Technical and imaging factors influencing performance of deep learning systems for diabetic retinopathy |
title_full_unstemmed | Technical and imaging factors influencing performance of deep learning systems for diabetic retinopathy |
title_short | Technical and imaging factors influencing performance of deep learning systems for diabetic retinopathy |
title_sort | technical and imaging factors influencing performance of deep learning systems for diabetic retinopathy |
url | https://doi.org/10.1038/s41746-020-0247-1 |
work_keys_str_mv | AT michelleytyip technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT gilbertlim technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT zhanweilim technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT quangdnguyen technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT crystalcychong technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT marcoyu technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT valentinabellemo technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT yuchenxie technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT xinqilee technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT haslinahamzah technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT jinyiho technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT tienentan technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT charumathisabanayagam technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT andrzejgrzybowski technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT gavinswtan technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT wynnehsu technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT monglilee technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT tienyinwong technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy AT danielswting technicalandimagingfactorsinfluencingperformanceofdeeplearningsystemsfordiabeticretinopathy |