Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy

Abstract Background The large and increasing number of new studies published each year is making literature identification in systematic reviews ever more time-consuming and costly. Technological assistance has been suggested as an alternative to the conventional, manual study identification to miti...

Full description

Bibliographic Details
Main Authors: Christopher R. Norman, Mariska M. G. Leeflang, Raphaël Porcher, Aurélie Névéol
Format: Article
Language:English
Published: BMC 2019-10-01
Series:Systematic Reviews
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13643-019-1162-x
_version_ 1828787397886738432
author Christopher R. Norman
Mariska M. G. Leeflang
Raphaël Porcher
Aurélie Névéol
author_facet Christopher R. Norman
Mariska M. G. Leeflang
Raphaël Porcher
Aurélie Névéol
author_sort Christopher R. Norman
collection DOAJ
description Abstract Background The large and increasing number of new studies published each year is making literature identification in systematic reviews ever more time-consuming and costly. Technological assistance has been suggested as an alternative to the conventional, manual study identification to mitigate the cost, but previous literature has mainly evaluated methods in terms of recall (search sensitivity) and workload reduction. There is a need to also evaluate whether screening prioritization methods leads to the same results and conclusions as exhaustive manual screening. In this study, we examined the impact of one screening prioritization method based on active learning on sensitivity and specificity estimates in systematic reviews of diagnostic test accuracy. Methods We simulated the screening process in 48 Cochrane reviews of diagnostic test accuracy and re-run 400 meta-analyses based on a least 3 studies. We compared screening prioritization (with technological assistance) and screening in randomized order (standard practice without technology assistance). We examined if the screening could have been stopped before identifying all relevant studies while still producing reliable summary estimates. For all meta-analyses, we also examined the relationship between the number of relevant studies and the reliability of the final estimates. Results The main meta-analysis in each systematic review could have been performed after screening an average of 30% of the candidate articles (range 0.07 to 100%). No systematic review would have required screening more than 2308 studies, whereas manual screening would have required screening up to 43,363 studies. Despite an average 70% recall, the estimation error would have been 1.3% on average, compared to an average 2% estimation error expected when replicating summary estimate calculations. Conclusion Screening prioritization coupled with stopping criteria in diagnostic test accuracy reviews can reliably detect when the screening process has identified a sufficient number of studies to perform the main meta-analysis with an accuracy within pre-specified tolerance limits. However, many of the systematic reviews did not identify a sufficient number of studies that the meta-analyses were accurate within a 2% limit even with exhaustive manual screening, i.e., using current practice.
first_indexed 2024-12-12T00:30:43Z
format Article
id doaj.art-cce0b091f2b6447396b3708ff7f9000d
institution Directory Open Access Journal
issn 2046-4053
language English
last_indexed 2024-12-12T00:30:43Z
publishDate 2019-10-01
publisher BMC
record_format Article
series Systematic Reviews
spelling doaj.art-cce0b091f2b6447396b3708ff7f9000d2022-12-22T00:44:29ZengBMCSystematic Reviews2046-40532019-10-018111810.1186/s13643-019-1162-xMeasuring the impact of screening automation on meta-analyses of diagnostic test accuracyChristopher R. Norman0Mariska M. G. Leeflang1Raphaël Porcher2Aurélie Névéol3LIMSI, CNRS, Université Paris SaclayAmsterdam Public Health, Amsterdam UMC, University of AmsterdamCenter for Clinical Epidemiology, Assistance Publique–Hôpitaux de Paris, Hôtel Dieu Hospital; Team METHODS, CRESS, INSERM U1153; University Paris DescartesLIMSI, CNRS, Université Paris SaclayAbstract Background The large and increasing number of new studies published each year is making literature identification in systematic reviews ever more time-consuming and costly. Technological assistance has been suggested as an alternative to the conventional, manual study identification to mitigate the cost, but previous literature has mainly evaluated methods in terms of recall (search sensitivity) and workload reduction. There is a need to also evaluate whether screening prioritization methods leads to the same results and conclusions as exhaustive manual screening. In this study, we examined the impact of one screening prioritization method based on active learning on sensitivity and specificity estimates in systematic reviews of diagnostic test accuracy. Methods We simulated the screening process in 48 Cochrane reviews of diagnostic test accuracy and re-run 400 meta-analyses based on a least 3 studies. We compared screening prioritization (with technological assistance) and screening in randomized order (standard practice without technology assistance). We examined if the screening could have been stopped before identifying all relevant studies while still producing reliable summary estimates. For all meta-analyses, we also examined the relationship between the number of relevant studies and the reliability of the final estimates. Results The main meta-analysis in each systematic review could have been performed after screening an average of 30% of the candidate articles (range 0.07 to 100%). No systematic review would have required screening more than 2308 studies, whereas manual screening would have required screening up to 43,363 studies. Despite an average 70% recall, the estimation error would have been 1.3% on average, compared to an average 2% estimation error expected when replicating summary estimate calculations. Conclusion Screening prioritization coupled with stopping criteria in diagnostic test accuracy reviews can reliably detect when the screening process has identified a sufficient number of studies to perform the main meta-analysis with an accuracy within pre-specified tolerance limits. However, many of the systematic reviews did not identify a sufficient number of studies that the meta-analyses were accurate within a 2% limit even with exhaustive manual screening, i.e., using current practice.http://link.springer.com/article/10.1186/s13643-019-1162-xEvidence based medicine*Machine learningNatural language processing/*methods*Systematic review as topic
spellingShingle Christopher R. Norman
Mariska M. G. Leeflang
Raphaël Porcher
Aurélie Névéol
Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy
Systematic Reviews
Evidence based medicine
*Machine learning
Natural language processing/*methods
*Systematic review as topic
title Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy
title_full Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy
title_fullStr Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy
title_full_unstemmed Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy
title_short Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy
title_sort measuring the impact of screening automation on meta analyses of diagnostic test accuracy
topic Evidence based medicine
*Machine learning
Natural language processing/*methods
*Systematic review as topic
url http://link.springer.com/article/10.1186/s13643-019-1162-x
work_keys_str_mv AT christopherrnorman measuringtheimpactofscreeningautomationonmetaanalysesofdiagnostictestaccuracy
AT mariskamgleeflang measuringtheimpactofscreeningautomationonmetaanalysesofdiagnostictestaccuracy
AT raphaelporcher measuringtheimpactofscreeningautomationonmetaanalysesofdiagnostictestaccuracy
AT aurelieneveol measuringtheimpactofscreeningautomationonmetaanalysesofdiagnostictestaccuracy