Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy
Abstract Background The large and increasing number of new studies published each year is making literature identification in systematic reviews ever more time-consuming and costly. Technological assistance has been suggested as an alternative to the conventional, manual study identification to miti...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-10-01
|
Series: | Systematic Reviews |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s13643-019-1162-x |
_version_ | 1828787397886738432 |
---|---|
author | Christopher R. Norman Mariska M. G. Leeflang Raphaël Porcher Aurélie Névéol |
author_facet | Christopher R. Norman Mariska M. G. Leeflang Raphaël Porcher Aurélie Névéol |
author_sort | Christopher R. Norman |
collection | DOAJ |
description | Abstract Background The large and increasing number of new studies published each year is making literature identification in systematic reviews ever more time-consuming and costly. Technological assistance has been suggested as an alternative to the conventional, manual study identification to mitigate the cost, but previous literature has mainly evaluated methods in terms of recall (search sensitivity) and workload reduction. There is a need to also evaluate whether screening prioritization methods leads to the same results and conclusions as exhaustive manual screening. In this study, we examined the impact of one screening prioritization method based on active learning on sensitivity and specificity estimates in systematic reviews of diagnostic test accuracy. Methods We simulated the screening process in 48 Cochrane reviews of diagnostic test accuracy and re-run 400 meta-analyses based on a least 3 studies. We compared screening prioritization (with technological assistance) and screening in randomized order (standard practice without technology assistance). We examined if the screening could have been stopped before identifying all relevant studies while still producing reliable summary estimates. For all meta-analyses, we also examined the relationship between the number of relevant studies and the reliability of the final estimates. Results The main meta-analysis in each systematic review could have been performed after screening an average of 30% of the candidate articles (range 0.07 to 100%). No systematic review would have required screening more than 2308 studies, whereas manual screening would have required screening up to 43,363 studies. Despite an average 70% recall, the estimation error would have been 1.3% on average, compared to an average 2% estimation error expected when replicating summary estimate calculations. Conclusion Screening prioritization coupled with stopping criteria in diagnostic test accuracy reviews can reliably detect when the screening process has identified a sufficient number of studies to perform the main meta-analysis with an accuracy within pre-specified tolerance limits. However, many of the systematic reviews did not identify a sufficient number of studies that the meta-analyses were accurate within a 2% limit even with exhaustive manual screening, i.e., using current practice. |
first_indexed | 2024-12-12T00:30:43Z |
format | Article |
id | doaj.art-cce0b091f2b6447396b3708ff7f9000d |
institution | Directory Open Access Journal |
issn | 2046-4053 |
language | English |
last_indexed | 2024-12-12T00:30:43Z |
publishDate | 2019-10-01 |
publisher | BMC |
record_format | Article |
series | Systematic Reviews |
spelling | doaj.art-cce0b091f2b6447396b3708ff7f9000d2022-12-22T00:44:29ZengBMCSystematic Reviews2046-40532019-10-018111810.1186/s13643-019-1162-xMeasuring the impact of screening automation on meta-analyses of diagnostic test accuracyChristopher R. Norman0Mariska M. G. Leeflang1Raphaël Porcher2Aurélie Névéol3LIMSI, CNRS, Université Paris SaclayAmsterdam Public Health, Amsterdam UMC, University of AmsterdamCenter for Clinical Epidemiology, Assistance Publique–Hôpitaux de Paris, Hôtel Dieu Hospital; Team METHODS, CRESS, INSERM U1153; University Paris DescartesLIMSI, CNRS, Université Paris SaclayAbstract Background The large and increasing number of new studies published each year is making literature identification in systematic reviews ever more time-consuming and costly. Technological assistance has been suggested as an alternative to the conventional, manual study identification to mitigate the cost, but previous literature has mainly evaluated methods in terms of recall (search sensitivity) and workload reduction. There is a need to also evaluate whether screening prioritization methods leads to the same results and conclusions as exhaustive manual screening. In this study, we examined the impact of one screening prioritization method based on active learning on sensitivity and specificity estimates in systematic reviews of diagnostic test accuracy. Methods We simulated the screening process in 48 Cochrane reviews of diagnostic test accuracy and re-run 400 meta-analyses based on a least 3 studies. We compared screening prioritization (with technological assistance) and screening in randomized order (standard practice without technology assistance). We examined if the screening could have been stopped before identifying all relevant studies while still producing reliable summary estimates. For all meta-analyses, we also examined the relationship between the number of relevant studies and the reliability of the final estimates. Results The main meta-analysis in each systematic review could have been performed after screening an average of 30% of the candidate articles (range 0.07 to 100%). No systematic review would have required screening more than 2308 studies, whereas manual screening would have required screening up to 43,363 studies. Despite an average 70% recall, the estimation error would have been 1.3% on average, compared to an average 2% estimation error expected when replicating summary estimate calculations. Conclusion Screening prioritization coupled with stopping criteria in diagnostic test accuracy reviews can reliably detect when the screening process has identified a sufficient number of studies to perform the main meta-analysis with an accuracy within pre-specified tolerance limits. However, many of the systematic reviews did not identify a sufficient number of studies that the meta-analyses were accurate within a 2% limit even with exhaustive manual screening, i.e., using current practice.http://link.springer.com/article/10.1186/s13643-019-1162-xEvidence based medicine*Machine learningNatural language processing/*methods*Systematic review as topic |
spellingShingle | Christopher R. Norman Mariska M. G. Leeflang Raphaël Porcher Aurélie Névéol Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy Systematic Reviews Evidence based medicine *Machine learning Natural language processing/*methods *Systematic review as topic |
title | Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy |
title_full | Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy |
title_fullStr | Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy |
title_full_unstemmed | Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy |
title_short | Measuring the impact of screening automation on meta-analyses of diagnostic test accuracy |
title_sort | measuring the impact of screening automation on meta analyses of diagnostic test accuracy |
topic | Evidence based medicine *Machine learning Natural language processing/*methods *Systematic review as topic |
url | http://link.springer.com/article/10.1186/s13643-019-1162-x |
work_keys_str_mv | AT christopherrnorman measuringtheimpactofscreeningautomationonmetaanalysesofdiagnostictestaccuracy AT mariskamgleeflang measuringtheimpactofscreeningautomationonmetaanalysesofdiagnostictestaccuracy AT raphaelporcher measuringtheimpactofscreeningautomationonmetaanalysesofdiagnostictestaccuracy AT aurelieneveol measuringtheimpactofscreeningautomationonmetaanalysesofdiagnostictestaccuracy |