Performance evaluation methods for improvements at post-market of artificial intelligence/machine learning-based computer-aided detection/diagnosis/triage in the United States.

Computer-aided detection (CADe), computer-aided diagnosis (CADx), and computer-aided simple triage (CAST), which incorporate artificial intelligence (AI) and machine learning (ML), are continually undergoing post-market improvement. Therefore, understanding the evaluation and approval process of imp...

Full description

Bibliographic Details
Main Authors: Mitsuru Yuba, Kiyotaka Iwasaki
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2023-03-01
Series:PLOS Digital Health
Online Access:https://doi.org/10.1371/journal.pdig.0000209
_version_ 1797704250302660608
author Mitsuru Yuba
Kiyotaka Iwasaki
author_facet Mitsuru Yuba
Kiyotaka Iwasaki
author_sort Mitsuru Yuba
collection DOAJ
description Computer-aided detection (CADe), computer-aided diagnosis (CADx), and computer-aided simple triage (CAST), which incorporate artificial intelligence (AI) and machine learning (ML), are continually undergoing post-market improvement. Therefore, understanding the evaluation and approval process of improved products is important. This study intended to conduct a comprehensive survey of AI/ML-based CAD products approved by the U.S. Food and Drug Administration (FDA) that had been improved post-market to gain insights into the efficacy and safety required for market approval. A survey of the product code database published by the FDA identified eight products that were improved post-market. The methods used to evaluate the performance of improvements were analysed, and post-market improvements were approved with retrospective data. Reader study testing (RT) or software standalone testing (SA) procedures were conducted retrospectively. Six RT procedures were conducted because of modifications to the intended use. An average of 17.3 readers (minimum 14, maximum 24) participated, and the area under the curve (AUC) was considered the primary endpoint. The addition of study learning data that did not change the intended use and changes in the analysis algorithm were evaluated by SA. The average sensitivity, specificity, and AUC were 93% (minimum 91.1, maximum 97), 89.6% (minimum 85.9, maximum 96), and 0.96 (minimum 0.96, maximum 0.97), respectively. The average interval between applications was 348 days (minimum -18, maximum 975), which showed that the improvements were implemented within approximately one year. This is the first comprehensive study on AI/ML-based CAD products that have been improved post-market to elucidate evaluation points for post-market improvements. The findings will be informative for the industry and academia in developing and improving AI/ML-based CAD.
first_indexed 2024-03-12T05:16:30Z
format Article
id doaj.art-3fd6cb1a26ad466a8acaf5be46e04620
institution Directory Open Access Journal
issn 2767-3170
language English
last_indexed 2024-03-12T05:16:30Z
publishDate 2023-03-01
publisher Public Library of Science (PLoS)
record_format Article
series PLOS Digital Health
spelling doaj.art-3fd6cb1a26ad466a8acaf5be46e046202023-09-03T08:02:49ZengPublic Library of Science (PLoS)PLOS Digital Health2767-31702023-03-0123e000020910.1371/journal.pdig.0000209Performance evaluation methods for improvements at post-market of artificial intelligence/machine learning-based computer-aided detection/diagnosis/triage in the United States.Mitsuru YubaKiyotaka IwasakiComputer-aided detection (CADe), computer-aided diagnosis (CADx), and computer-aided simple triage (CAST), which incorporate artificial intelligence (AI) and machine learning (ML), are continually undergoing post-market improvement. Therefore, understanding the evaluation and approval process of improved products is important. This study intended to conduct a comprehensive survey of AI/ML-based CAD products approved by the U.S. Food and Drug Administration (FDA) that had been improved post-market to gain insights into the efficacy and safety required for market approval. A survey of the product code database published by the FDA identified eight products that were improved post-market. The methods used to evaluate the performance of improvements were analysed, and post-market improvements were approved with retrospective data. Reader study testing (RT) or software standalone testing (SA) procedures were conducted retrospectively. Six RT procedures were conducted because of modifications to the intended use. An average of 17.3 readers (minimum 14, maximum 24) participated, and the area under the curve (AUC) was considered the primary endpoint. The addition of study learning data that did not change the intended use and changes in the analysis algorithm were evaluated by SA. The average sensitivity, specificity, and AUC were 93% (minimum 91.1, maximum 97), 89.6% (minimum 85.9, maximum 96), and 0.96 (minimum 0.96, maximum 0.97), respectively. The average interval between applications was 348 days (minimum -18, maximum 975), which showed that the improvements were implemented within approximately one year. This is the first comprehensive study on AI/ML-based CAD products that have been improved post-market to elucidate evaluation points for post-market improvements. The findings will be informative for the industry and academia in developing and improving AI/ML-based CAD.https://doi.org/10.1371/journal.pdig.0000209
spellingShingle Mitsuru Yuba
Kiyotaka Iwasaki
Performance evaluation methods for improvements at post-market of artificial intelligence/machine learning-based computer-aided detection/diagnosis/triage in the United States.
PLOS Digital Health
title Performance evaluation methods for improvements at post-market of artificial intelligence/machine learning-based computer-aided detection/diagnosis/triage in the United States.
title_full Performance evaluation methods for improvements at post-market of artificial intelligence/machine learning-based computer-aided detection/diagnosis/triage in the United States.
title_fullStr Performance evaluation methods for improvements at post-market of artificial intelligence/machine learning-based computer-aided detection/diagnosis/triage in the United States.
title_full_unstemmed Performance evaluation methods for improvements at post-market of artificial intelligence/machine learning-based computer-aided detection/diagnosis/triage in the United States.
title_short Performance evaluation methods for improvements at post-market of artificial intelligence/machine learning-based computer-aided detection/diagnosis/triage in the United States.
title_sort performance evaluation methods for improvements at post market of artificial intelligence machine learning based computer aided detection diagnosis triage in the united states
url https://doi.org/10.1371/journal.pdig.0000209
work_keys_str_mv AT mitsuruyuba performanceevaluationmethodsforimprovementsatpostmarketofartificialintelligencemachinelearningbasedcomputeraideddetectiondiagnosistriageintheunitedstates
AT kiyotakaiwasaki performanceevaluationmethodsforimprovementsatpostmarketofartificialintelligencemachinelearningbasedcomputeraideddetectiondiagnosistriageintheunitedstates