Evaluating the generalizability of deep learning image classification algorithms to detect middle ear disease using otoscopy

Abstract To evaluate the generalizability of artificial intelligence (AI) algorithms that use deep learning methods to identify middle ear disease from otoscopic images, between internal to external performance. 1842 otoscopic images were collected from three independent sources: (a) Van, Turkey, (b...

Full description

Bibliographic Details
Main Authors:	Al-Rahim Habib, Yixi Xu, Kris Bock, Shrestha Mohanty, Tina Sederholm, William B. Weeks, Rahul Dodhia, Juan Lavista Ferres, Chris Perry, Raymond Sacks, Narinder Singh
Format:	Article
Language:	English
Published:	Nature Portfolio 2023-04-01
Series:	Scientific Reports
Online Access:	https://doi.org/10.1038/s41598-023-31921-0

_version_	1797853765839093760
author	Al-Rahim Habib Yixi Xu Kris Bock Shrestha Mohanty Tina Sederholm William B. Weeks Rahul Dodhia Juan Lavista Ferres Chris Perry Raymond Sacks Narinder Singh
author_facet	Al-Rahim Habib Yixi Xu Kris Bock Shrestha Mohanty Tina Sederholm William B. Weeks Rahul Dodhia Juan Lavista Ferres Chris Perry Raymond Sacks Narinder Singh
author_sort	Al-Rahim Habib
collection	DOAJ
description	Abstract To evaluate the generalizability of artificial intelligence (AI) algorithms that use deep learning methods to identify middle ear disease from otoscopic images, between internal to external performance. 1842 otoscopic images were collected from three independent sources: (a) Van, Turkey, (b) Santiago, Chile, and (c) Ohio, USA. Diagnostic categories consisted of (i) normal or (ii) abnormal. Deep learning methods were used to develop models to evaluate internal and external performance, using area under the curve (AUC) estimates. A pooled assessment was performed by combining all cohorts together with fivefold cross validation. AI-otoscopy algorithms achieved high internal performance (mean AUC: 0.95, 95%CI: 0.80–1.00). However, performance was reduced when tested on external otoscopic images not used for training (mean AUC: 0.76, 95%CI: 0.61–0.91). Overall, external performance was significantly lower than internal performance (mean difference in AUC: −0.19, p ≤ 0.04). Combining cohorts achieved a substantial pooled performance (AUC: 0.96, standard error: 0.01). Internally applied algorithms for otoscopy performed well to identify middle ear disease from otoscopy images. However, external performance was reduced when applied to new test cohorts. Further efforts are required to explore data augmentation and pre-processing techniques that might improve external performance and develop a robust, generalizable algorithm for real-world clinical applications.
first_indexed	2024-04-09T19:56:02Z
format	Article
id	doaj.art-178e95a4dae24ad3951de74bfc419172
institution	Directory Open Access Journal
issn	2045-2322
language	English
last_indexed	2024-04-09T19:56:02Z
publishDate	2023-04-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj.art-178e95a4dae24ad3951de74bfc4191722023-04-03T05:27:21ZengNature PortfolioScientific Reports2045-23222023-04-011311910.1038/s41598-023-31921-0Evaluating the generalizability of deep learning image classification algorithms to detect middle ear disease using otoscopyAl-Rahim Habib0Yixi Xu1Kris Bock2Shrestha Mohanty3Tina Sederholm4William B. Weeks5Rahul Dodhia6Juan Lavista Ferres7Chris Perry8Raymond Sacks9Narinder Singh10Faculty of Medicine and Health, University of SydneyAI for Good Lab, MicrosoftAzure FastTrack EngineeringMicrosoftAI for Good Lab, MicrosoftAI for Good Lab, MicrosoftAI for Good Lab, MicrosoftAI for Good Lab, MicrosoftUniversity of Queensland Medical SchoolFaculty of Medicine and Health, University of SydneyFaculty of Medicine and Health, University of SydneyAbstract To evaluate the generalizability of artificial intelligence (AI) algorithms that use deep learning methods to identify middle ear disease from otoscopic images, between internal to external performance. 1842 otoscopic images were collected from three independent sources: (a) Van, Turkey, (b) Santiago, Chile, and (c) Ohio, USA. Diagnostic categories consisted of (i) normal or (ii) abnormal. Deep learning methods were used to develop models to evaluate internal and external performance, using area under the curve (AUC) estimates. A pooled assessment was performed by combining all cohorts together with fivefold cross validation. AI-otoscopy algorithms achieved high internal performance (mean AUC: 0.95, 95%CI: 0.80–1.00). However, performance was reduced when tested on external otoscopic images not used for training (mean AUC: 0.76, 95%CI: 0.61–0.91). Overall, external performance was significantly lower than internal performance (mean difference in AUC: −0.19, p ≤ 0.04). Combining cohorts achieved a substantial pooled performance (AUC: 0.96, standard error: 0.01). Internally applied algorithms for otoscopy performed well to identify middle ear disease from otoscopy images. However, external performance was reduced when applied to new test cohorts. Further efforts are required to explore data augmentation and pre-processing techniques that might improve external performance and develop a robust, generalizable algorithm for real-world clinical applications.https://doi.org/10.1038/s41598-023-31921-0
spellingShingle	Al-Rahim Habib Yixi Xu Kris Bock Shrestha Mohanty Tina Sederholm William B. Weeks Rahul Dodhia Juan Lavista Ferres Chris Perry Raymond Sacks Narinder Singh Evaluating the generalizability of deep learning image classification algorithms to detect middle ear disease using otoscopy Scientific Reports
title	Evaluating the generalizability of deep learning image classification algorithms to detect middle ear disease using otoscopy
title_full	Evaluating the generalizability of deep learning image classification algorithms to detect middle ear disease using otoscopy
title_fullStr	Evaluating the generalizability of deep learning image classification algorithms to detect middle ear disease using otoscopy
title_full_unstemmed	Evaluating the generalizability of deep learning image classification algorithms to detect middle ear disease using otoscopy
title_short	Evaluating the generalizability of deep learning image classification algorithms to detect middle ear disease using otoscopy
title_sort	evaluating the generalizability of deep learning image classification algorithms to detect middle ear disease using otoscopy
url	https://doi.org/10.1038/s41598-023-31921-0
work_keys_str_mv	AT alrahimhabib evaluatingthegeneralizabilityofdeeplearningimageclassificationalgorithmstodetectmiddleeardiseaseusingotoscopy AT yixixu evaluatingthegeneralizabilityofdeeplearningimageclassificationalgorithmstodetectmiddleeardiseaseusingotoscopy AT krisbock evaluatingthegeneralizabilityofdeeplearningimageclassificationalgorithmstodetectmiddleeardiseaseusingotoscopy AT shresthamohanty evaluatingthegeneralizabilityofdeeplearningimageclassificationalgorithmstodetectmiddleeardiseaseusingotoscopy AT tinasederholm evaluatingthegeneralizabilityofdeeplearningimageclassificationalgorithmstodetectmiddleeardiseaseusingotoscopy AT williambweeks evaluatingthegeneralizabilityofdeeplearningimageclassificationalgorithmstodetectmiddleeardiseaseusingotoscopy AT rahuldodhia evaluatingthegeneralizabilityofdeeplearningimageclassificationalgorithmstodetectmiddleeardiseaseusingotoscopy AT juanlavistaferres evaluatingthegeneralizabilityofdeeplearningimageclassificationalgorithmstodetectmiddleeardiseaseusingotoscopy AT chrisperry evaluatingthegeneralizabilityofdeeplearningimageclassificationalgorithmstodetectmiddleeardiseaseusingotoscopy AT raymondsacks evaluatingthegeneralizabilityofdeeplearningimageclassificationalgorithmstodetectmiddleeardiseaseusingotoscopy AT narindersingh evaluatingthegeneralizabilityofdeeplearningimageclassificationalgorithmstodetectmiddleeardiseaseusingotoscopy

Evaluating the generalizability of deep learning image classification algorithms to detect middle ear disease using otoscopy

Similar Items