Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists.

<h4>Background</h4>Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming task typically requires expert radiologists to read the images, leading...

Full description

Bibliographic Details
Main Authors: Pranav Rajpurkar, Jeremy Irvin, Robyn L Ball, Kaylie Zhu, Brandon Yang, Hershel Mehta, Tony Duan, Daisy Ding, Aarti Bagul, Curtis P Langlotz, Bhavik N Patel, Kristen W Yeom, Katie Shpanskaya, Francis G Blankenberg, Jayne Seekins, Timothy J Amrhein, David A Mong, Safwan S Halabi, Evan J Zucker, Andrew Y Ng, Matthew P Lungren
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-11-01
Series:PLoS Medicine
Online Access:https://doi.org/10.1371/journal.pmed.1002686
_version_ 1819289970671616000
author Pranav Rajpurkar
Jeremy Irvin
Robyn L Ball
Kaylie Zhu
Brandon Yang
Hershel Mehta
Tony Duan
Daisy Ding
Aarti Bagul
Curtis P Langlotz
Bhavik N Patel
Kristen W Yeom
Katie Shpanskaya
Francis G Blankenberg
Jayne Seekins
Timothy J Amrhein
David A Mong
Safwan S Halabi
Evan J Zucker
Andrew Y Ng
Matthew P Lungren
author_facet Pranav Rajpurkar
Jeremy Irvin
Robyn L Ball
Kaylie Zhu
Brandon Yang
Hershel Mehta
Tony Duan
Daisy Ding
Aarti Bagul
Curtis P Langlotz
Bhavik N Patel
Kristen W Yeom
Katie Shpanskaya
Francis G Blankenberg
Jayne Seekins
Timothy J Amrhein
David A Mong
Safwan S Halabi
Evan J Zucker
Andrew Y Ng
Matthew P Lungren
author_sort Pranav Rajpurkar
collection DOAJ
description <h4>Background</h4>Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming task typically requires expert radiologists to read the images, leading to fatigue-based diagnostic error and lack of diagnostic expertise in areas of the world where radiologists are not available. Recently, deep learning approaches have been able to achieve expert-level performance in medical image interpretation tasks, powered by large network architectures and fueled by the emergence of large labeled datasets. The purpose of this study is to investigate the performance of a deep learning algorithm on the detection of pathologies in chest radiographs compared with practicing radiologists.<h4>Methods and findings</h4>We developed CheXNeXt, a convolutional neural network to concurrently detect the presence of 14 different pathologies, including pneumonia, pleural effusion, pulmonary masses, and nodules in frontal-view chest radiographs. CheXNeXt was trained and internally validated on the ChestX-ray8 dataset, with a held-out validation set consisting of 420 images, sampled to contain at least 50 cases of each of the original pathology labels. On this validation set, the majority vote of a panel of 3 board-certified cardiothoracic specialist radiologists served as reference standard. We compared CheXNeXt's discriminative performance on the validation set to the performance of 9 radiologists using the area under the receiver operating characteristic curve (AUC). The radiologists included 6 board-certified radiologists (average experience 12 years, range 4-28 years) and 3 senior radiology residents, from 3 academic institutions. We found that CheXNeXt achieved radiologist-level performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies. The radiologists achieved statistically significantly higher AUC performance on cardiomegaly, emphysema, and hiatal hernia, with AUCs of 0.888 (95% confidence interval [CI] 0.863-0.910), 0.911 (95% CI 0.866-0.947), and 0.985 (95% CI 0.974-0.991), respectively, whereas CheXNeXt's AUCs were 0.831 (95% CI 0.790-0.870), 0.704 (95% CI 0.567-0.833), and 0.851 (95% CI 0.785-0.909), respectively. CheXNeXt performed better than radiologists in detecting atelectasis, with an AUC of 0.862 (95% CI 0.825-0.895), statistically significantly higher than radiologists' AUC of 0.808 (95% CI 0.777-0.838); there were no statistically significant differences in AUCs for the other 10 pathologies. The average time to interpret the 420 images in the validation set was substantially longer for the radiologists (240 minutes) than for CheXNeXt (1.5 minutes). The main limitations of our study are that neither CheXNeXt nor the radiologists were permitted to use patient history or review prior examinations and that evaluation was limited to a dataset from a single institution.<h4>Conclusions</h4>In this study, we developed and validated a deep learning algorithm that classified clinically important abnormalities in chest radiographs at a performance level comparable to practicing radiologists. Once tested prospectively in clinical settings, the algorithm could have the potential to expand patient access to chest radiograph diagnostics.
first_indexed 2024-12-24T03:15:19Z
format Article
id doaj.art-b1116af6ef4f471a87301326afba4bfd
institution Directory Open Access Journal
issn 1549-1277
1549-1676
language English
last_indexed 2024-12-24T03:15:19Z
publishDate 2018-11-01
publisher Public Library of Science (PLoS)
record_format Article
series PLoS Medicine
spelling doaj.art-b1116af6ef4f471a87301326afba4bfd2022-12-21T17:17:39ZengPublic Library of Science (PLoS)PLoS Medicine1549-12771549-16762018-11-011511e100268610.1371/journal.pmed.1002686Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists.Pranav RajpurkarJeremy IrvinRobyn L BallKaylie ZhuBrandon YangHershel MehtaTony DuanDaisy DingAarti BagulCurtis P LanglotzBhavik N PatelKristen W YeomKatie ShpanskayaFrancis G BlankenbergJayne SeekinsTimothy J AmrheinDavid A MongSafwan S HalabiEvan J ZuckerAndrew Y NgMatthew P Lungren<h4>Background</h4>Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming task typically requires expert radiologists to read the images, leading to fatigue-based diagnostic error and lack of diagnostic expertise in areas of the world where radiologists are not available. Recently, deep learning approaches have been able to achieve expert-level performance in medical image interpretation tasks, powered by large network architectures and fueled by the emergence of large labeled datasets. The purpose of this study is to investigate the performance of a deep learning algorithm on the detection of pathologies in chest radiographs compared with practicing radiologists.<h4>Methods and findings</h4>We developed CheXNeXt, a convolutional neural network to concurrently detect the presence of 14 different pathologies, including pneumonia, pleural effusion, pulmonary masses, and nodules in frontal-view chest radiographs. CheXNeXt was trained and internally validated on the ChestX-ray8 dataset, with a held-out validation set consisting of 420 images, sampled to contain at least 50 cases of each of the original pathology labels. On this validation set, the majority vote of a panel of 3 board-certified cardiothoracic specialist radiologists served as reference standard. We compared CheXNeXt's discriminative performance on the validation set to the performance of 9 radiologists using the area under the receiver operating characteristic curve (AUC). The radiologists included 6 board-certified radiologists (average experience 12 years, range 4-28 years) and 3 senior radiology residents, from 3 academic institutions. We found that CheXNeXt achieved radiologist-level performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies. The radiologists achieved statistically significantly higher AUC performance on cardiomegaly, emphysema, and hiatal hernia, with AUCs of 0.888 (95% confidence interval [CI] 0.863-0.910), 0.911 (95% CI 0.866-0.947), and 0.985 (95% CI 0.974-0.991), respectively, whereas CheXNeXt's AUCs were 0.831 (95% CI 0.790-0.870), 0.704 (95% CI 0.567-0.833), and 0.851 (95% CI 0.785-0.909), respectively. CheXNeXt performed better than radiologists in detecting atelectasis, with an AUC of 0.862 (95% CI 0.825-0.895), statistically significantly higher than radiologists' AUC of 0.808 (95% CI 0.777-0.838); there were no statistically significant differences in AUCs for the other 10 pathologies. The average time to interpret the 420 images in the validation set was substantially longer for the radiologists (240 minutes) than for CheXNeXt (1.5 minutes). The main limitations of our study are that neither CheXNeXt nor the radiologists were permitted to use patient history or review prior examinations and that evaluation was limited to a dataset from a single institution.<h4>Conclusions</h4>In this study, we developed and validated a deep learning algorithm that classified clinically important abnormalities in chest radiographs at a performance level comparable to practicing radiologists. Once tested prospectively in clinical settings, the algorithm could have the potential to expand patient access to chest radiograph diagnostics.https://doi.org/10.1371/journal.pmed.1002686
spellingShingle Pranav Rajpurkar
Jeremy Irvin
Robyn L Ball
Kaylie Zhu
Brandon Yang
Hershel Mehta
Tony Duan
Daisy Ding
Aarti Bagul
Curtis P Langlotz
Bhavik N Patel
Kristen W Yeom
Katie Shpanskaya
Francis G Blankenberg
Jayne Seekins
Timothy J Amrhein
David A Mong
Safwan S Halabi
Evan J Zucker
Andrew Y Ng
Matthew P Lungren
Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists.
PLoS Medicine
title Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists.
title_full Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists.
title_fullStr Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists.
title_full_unstemmed Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists.
title_short Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists.
title_sort deep learning for chest radiograph diagnosis a retrospective comparison of the chexnext algorithm to practicing radiologists
url https://doi.org/10.1371/journal.pmed.1002686
work_keys_str_mv AT pranavrajpurkar deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT jeremyirvin deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT robynlball deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT kayliezhu deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT brandonyang deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT hershelmehta deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT tonyduan deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT daisyding deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT aartibagul deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT curtisplanglotz deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT bhaviknpatel deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT kristenwyeom deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT katieshpanskaya deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT francisgblankenberg deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT jayneseekins deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT timothyjamrhein deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT davidamong deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT safwanshalabi deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT evanjzucker deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT andrewyng deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT matthewplungren deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists