Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.

We have attempted to reproduce the results in Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs, published in JAMA 2016; 316(22), using publicly available data sets. We re-implemented the main method in the original study sinc...

Full description

Bibliographic Details
Main Authors:	Mike Voets, Kajsa Møllersen, Lars Ailo Bongo
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2019-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0217541

_version_	1827794508686819328
author	Mike Voets Kajsa Møllersen Lars Ailo Bongo
author_facet	Mike Voets Kajsa Møllersen Lars Ailo Bongo
author_sort	Mike Voets
collection	DOAJ
description	We have attempted to reproduce the results in Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs, published in JAMA 2016; 316(22), using publicly available data sets. We re-implemented the main method in the original study since the source code is not available. The original study used non-public fundus images from EyePACS and three hospitals in India for training. We used a different EyePACS data set from Kaggle. The original study used the benchmark data set Messidor-2 to evaluate the algorithm's performance. We used another distribution of the Messidor-2 data set, since the original data set is no longer available. In the original study, ophthalmologists re-graded all images for diabetic retinopathy, macular edema, and image gradability. We have one diabetic retinopathy grade per image for our data sets, and we assessed image gradability ourselves. We were not able to reproduce the original study's results with publicly available data. Our algorithm's area under the receiver operating characteristic curve (AUC) of 0.951 (95% CI, 0.947-0.956) on the Kaggle EyePACS test set and 0.853 (95% CI, 0.835-0.871) on Messidor-2 did not come close to the reported AUC of 0.99 on both test sets in the original study. This may be caused by the use of a single grade per image, or different data. This study shows the challenges of reproducing deep learning method results, and the need for more replication and reproduction studies to validate deep learning methods, especially for medical image analysis. Our source code and instructions are available at: https://github.com/mikevoets/jama16-retina-replication.
first_indexed	2024-03-11T18:34:11Z
format	Article
id	doaj.art-a73b03131ef242888c61c7a38b62c732
institution	Directory Open Access Journal
issn	1932-6203
language	English
last_indexed	2024-03-11T18:34:11Z
publishDate	2019-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj.art-a73b03131ef242888c61c7a38b62c7322023-10-13T05:32:01ZengPublic Library of Science (PLoS)PLoS ONE1932-62032019-01-01146e021754110.1371/journal.pone.0217541Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.Mike VoetsKajsa MøllersenLars Ailo BongoWe have attempted to reproduce the results in Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs, published in JAMA 2016; 316(22), using publicly available data sets. We re-implemented the main method in the original study since the source code is not available. The original study used non-public fundus images from EyePACS and three hospitals in India for training. We used a different EyePACS data set from Kaggle. The original study used the benchmark data set Messidor-2 to evaluate the algorithm's performance. We used another distribution of the Messidor-2 data set, since the original data set is no longer available. In the original study, ophthalmologists re-graded all images for diabetic retinopathy, macular edema, and image gradability. We have one diabetic retinopathy grade per image for our data sets, and we assessed image gradability ourselves. We were not able to reproduce the original study's results with publicly available data. Our algorithm's area under the receiver operating characteristic curve (AUC) of 0.951 (95% CI, 0.947-0.956) on the Kaggle EyePACS test set and 0.853 (95% CI, 0.835-0.871) on Messidor-2 did not come close to the reported AUC of 0.99 on both test sets in the original study. This may be caused by the use of a single grade per image, or different data. This study shows the challenges of reproducing deep learning method results, and the need for more replication and reproduction studies to validate deep learning methods, especially for medical image analysis. Our source code and instructions are available at: https://github.com/mikevoets/jama16-retina-replication.https://doi.org/10.1371/journal.pone.0217541
spellingShingle	Mike Voets Kajsa Møllersen Lars Ailo Bongo Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. PLoS ONE
title	Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.
title_full	Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.
title_fullStr	Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.
title_full_unstemmed	Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.
title_short	Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.
title_sort	reproduction study using public data of development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs
url	https://doi.org/10.1371/journal.pone.0217541
work_keys_str_mv	AT mikevoets reproductionstudyusingpublicdataofdevelopmentandvalidationofadeeplearningalgorithmfordetectionofdiabeticretinopathyinretinalfundusphotographs AT kajsamøllersen reproductionstudyusingpublicdataofdevelopmentandvalidationofadeeplearningalgorithmfordetectionofdiabeticretinopathyinretinalfundusphotographs AT larsailobongo reproductionstudyusingpublicdataofdevelopmentandvalidationofadeeplearningalgorithmfordetectionofdiabeticretinopathyinretinalfundusphotographs

Reproduction study using public data of: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs.

Similar Items