Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence

Purpose: Developing robust artificial intelligence (AI) models for medical image analysis requires large quantities of diverse, well-chosen data that can prove challenging to collect because of privacy concerns, disease rarity, or diagnostic label quality. Collecting image-based datasets for retinop...

Full description

Bibliographic Details
Main Authors: Aaron S. Coyner, PhD, Jimmy S. Chen, MD, Ken Chang, PhD, Praveer Singh, PhD, Susan Ostmo, MS, R. V. Paul Chan, MD, MSc, Michael F. Chiang, MD, MA, Jayashree Kalpathy-Cramer, PhD, J. Peter Campbell, MD, MPH
Format: Article
Language:English
Published: Elsevier 2022-06-01
Series:Ophthalmology Science
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S266691452200015X
_version_ 1818249285597134848
author Aaron S. Coyner, PhD
Jimmy S. Chen, MD
Ken Chang, PhD
Praveer Singh, PhD
Susan Ostmo, MS
R. V. Paul Chan, MD, MSc
Michael F. Chiang, MD, MA
Jayashree Kalpathy-Cramer, PhD
J. Peter Campbell, MD, MPH
author_facet Aaron S. Coyner, PhD
Jimmy S. Chen, MD
Ken Chang, PhD
Praveer Singh, PhD
Susan Ostmo, MS
R. V. Paul Chan, MD, MSc
Michael F. Chiang, MD, MA
Jayashree Kalpathy-Cramer, PhD
J. Peter Campbell, MD, MPH
author_sort Aaron S. Coyner, PhD
collection DOAJ
description Purpose: Developing robust artificial intelligence (AI) models for medical image analysis requires large quantities of diverse, well-chosen data that can prove challenging to collect because of privacy concerns, disease rarity, or diagnostic label quality. Collecting image-based datasets for retinopathy of prematurity (ROP), a potentially blinding disease, suffers from these challenges. Progressively growing generative adversarial networks (PGANs) may help, because they can synthesize highly realistic images that may increase both the size and diversity of medical datasets. Design: Diagnostic validation study of convolutional neural networks (CNNs) for plus disease detection, a component of severe ROP, using synthetic data. Participants: Five thousand eight hundred forty-two retinal fundus images (RFIs) collected from 963 preterm infants. Methods: Retinal vessel maps (RVMs) were segmented from RFIs. PGANs were trained to synthesize RVMs with normal, pre-plus, or plus disease vasculature. Convolutional neural networks were trained, using real or synthetic RVMs, to detect plus disease from 2 real RVM test datasets. Main Outcome Measures: Features of real and synthetic RVMs were evaluated using uniform manifold approximation and projection (UMAP). Similarities were evaluated at the dataset and feature level using Fréchet inception distance and Euclidean distance, respectively. CNN performance was assessed via area under the receiver operating characteristic curve (AUC); AUCs were compared via bootstrapping and Delong’s test for correlated receiver operating characteristic curves. Confusion matrices were compared using McNemar’s chi-square test and Cohen’s κ value. Results: The CNN trained on synthetic RVMs showed a significantly higher AUC (0.971; P = 0.006 and P = 0.004) and classified plus disease more similarly to a set of 8 international experts (κ = 0.922) than the CNN trained on real RVMs (AUC = 0.934; κ = 0.701). Real and synthetic RVMs overlapped, by plus disease diagnosis, on the UMAP manifold, showing that synthetic images spanned the disease severity spectrum. Fréchet inception distance and Euclidean distances suggested that real and synthetic RVMs were more dissimilar to one another than real RVMs were to one another, further suggesting that synthetic RVMs were distinct from the training data with respect to privacy considerations. Conclusions: Synthetic datasets may be useful for training robust medical AI models. Furthermore, PGANs may be able to synthesize realistic data for use without protected health information concerns.
first_indexed 2024-12-12T15:34:04Z
format Article
id doaj.art-2e71409d77a341a8b6c30ab2a5e34f77
institution Directory Open Access Journal
issn 2666-9145
language English
last_indexed 2024-12-12T15:34:04Z
publishDate 2022-06-01
publisher Elsevier
record_format Article
series Ophthalmology Science
spelling doaj.art-2e71409d77a341a8b6c30ab2a5e34f772022-12-22T00:20:03ZengElsevierOphthalmology Science2666-91452022-06-0122100126Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial IntelligenceAaron S. Coyner, PhD0Jimmy S. Chen, MD1Ken Chang, PhD2Praveer Singh, PhD3Susan Ostmo, MS4R. V. Paul Chan, MD, MSc5Michael F. Chiang, MD, MA6Jayashree Kalpathy-Cramer, PhD7J. Peter Campbell, MD, MPH8Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, OregonDepartment of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon; Department of Ophthalmology, Shiley Eye Institute, University of California, San Diego, San Diego, CaliforniaDepartment of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts; Center for Clinical Data Science, Massachusetts General Hospital and Boston Women’s Hospital, Boston, MassachusettsDepartment of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts; Center for Clinical Data Science, Massachusetts General Hospital and Boston Women’s Hospital, Boston, MassachusettsDepartment of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, OregonDepartment of Ophthalmology and Visual Sciences, Eye and Ear Infirmary, University of Illinois, Chicago, IllinoisNational Eye Institute, National Institutes of Health, Bethesda, MarylandDepartment of Radiology, Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts; Center for Clinical Data Science, Massachusetts General Hospital and Boston Women’s Hospital, Boston, MassachusettsDepartment of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, Portland, Oregon; Correspondence: J. Peter Campbell, MD, MPH, Department of Ophthalmology, Casey Eye Institute, Oregon Health & Science University, 545 SW Campus Drive, Portland, OR 97239.Purpose: Developing robust artificial intelligence (AI) models for medical image analysis requires large quantities of diverse, well-chosen data that can prove challenging to collect because of privacy concerns, disease rarity, or diagnostic label quality. Collecting image-based datasets for retinopathy of prematurity (ROP), a potentially blinding disease, suffers from these challenges. Progressively growing generative adversarial networks (PGANs) may help, because they can synthesize highly realistic images that may increase both the size and diversity of medical datasets. Design: Diagnostic validation study of convolutional neural networks (CNNs) for plus disease detection, a component of severe ROP, using synthetic data. Participants: Five thousand eight hundred forty-two retinal fundus images (RFIs) collected from 963 preterm infants. Methods: Retinal vessel maps (RVMs) were segmented from RFIs. PGANs were trained to synthesize RVMs with normal, pre-plus, or plus disease vasculature. Convolutional neural networks were trained, using real or synthetic RVMs, to detect plus disease from 2 real RVM test datasets. Main Outcome Measures: Features of real and synthetic RVMs were evaluated using uniform manifold approximation and projection (UMAP). Similarities were evaluated at the dataset and feature level using Fréchet inception distance and Euclidean distance, respectively. CNN performance was assessed via area under the receiver operating characteristic curve (AUC); AUCs were compared via bootstrapping and Delong’s test for correlated receiver operating characteristic curves. Confusion matrices were compared using McNemar’s chi-square test and Cohen’s κ value. Results: The CNN trained on synthetic RVMs showed a significantly higher AUC (0.971; P = 0.006 and P = 0.004) and classified plus disease more similarly to a set of 8 international experts (κ = 0.922) than the CNN trained on real RVMs (AUC = 0.934; κ = 0.701). Real and synthetic RVMs overlapped, by plus disease diagnosis, on the UMAP manifold, showing that synthetic images spanned the disease severity spectrum. Fréchet inception distance and Euclidean distances suggested that real and synthetic RVMs were more dissimilar to one another than real RVMs were to one another, further suggesting that synthetic RVMs were distinct from the training data with respect to privacy considerations. Conclusions: Synthetic datasets may be useful for training robust medical AI models. Furthermore, PGANs may be able to synthesize realistic data for use without protected health information concerns.http://www.sciencedirect.com/science/article/pii/S266691452200015XArtificial intelligenceDeep learningGenerative adversarial networkRetinopathy of prematurity
spellingShingle Aaron S. Coyner, PhD
Jimmy S. Chen, MD
Ken Chang, PhD
Praveer Singh, PhD
Susan Ostmo, MS
R. V. Paul Chan, MD, MSc
Michael F. Chiang, MD, MA
Jayashree Kalpathy-Cramer, PhD
J. Peter Campbell, MD, MPH
Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence
Ophthalmology Science
Artificial intelligence
Deep learning
Generative adversarial network
Retinopathy of prematurity
title Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence
title_full Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence
title_fullStr Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence
title_full_unstemmed Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence
title_short Synthetic Medical Images for Robust, Privacy-Preserving Training of Artificial Intelligence
title_sort synthetic medical images for robust privacy preserving training of artificial intelligence
topic Artificial intelligence
Deep learning
Generative adversarial network
Retinopathy of prematurity
url http://www.sciencedirect.com/science/article/pii/S266691452200015X
work_keys_str_mv AT aaronscoynerphd syntheticmedicalimagesforrobustprivacypreservingtrainingofartificialintelligence
AT jimmyschenmd syntheticmedicalimagesforrobustprivacypreservingtrainingofartificialintelligence
AT kenchangphd syntheticmedicalimagesforrobustprivacypreservingtrainingofartificialintelligence
AT praveersinghphd syntheticmedicalimagesforrobustprivacypreservingtrainingofartificialintelligence
AT susanostmoms syntheticmedicalimagesforrobustprivacypreservingtrainingofartificialintelligence
AT rvpaulchanmdmsc syntheticmedicalimagesforrobustprivacypreservingtrainingofartificialintelligence
AT michaelfchiangmdma syntheticmedicalimagesforrobustprivacypreservingtrainingofartificialintelligence
AT jayashreekalpathycramerphd syntheticmedicalimagesforrobustprivacypreservingtrainingofartificialintelligence
AT jpetercampbellmdmph syntheticmedicalimagesforrobustprivacypreservingtrainingofartificialintelligence