Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study

Abstract Background Web applications that employ natural language processing technologies to support systematic reviewers during abstract screening have become more common. The goal of our project was to conduct a case study to explore a screening approach that temporarily replaces a human screener...

Full description

Bibliographic Details
Main Authors: Gerald Gartlehner, Gernot Wagner, Linda Lux, Lisa Affengruber, Andreea Dobrescu, Angela Kaminski-Hartenthaler, Meera Viswanathan
Format: Article
Language:English
Published: BMC 2019-11-01
Series:Systematic Reviews
Subjects:
Online Access:http://link.springer.com/article/10.1186/s13643-019-1221-3
_version_ 1818550954750902272
author Gerald Gartlehner
Gernot Wagner
Linda Lux
Lisa Affengruber
Andreea Dobrescu
Angela Kaminski-Hartenthaler
Meera Viswanathan
author_facet Gerald Gartlehner
Gernot Wagner
Linda Lux
Lisa Affengruber
Andreea Dobrescu
Angela Kaminski-Hartenthaler
Meera Viswanathan
author_sort Gerald Gartlehner
collection DOAJ
description Abstract Background Web applications that employ natural language processing technologies to support systematic reviewers during abstract screening have become more common. The goal of our project was to conduct a case study to explore a screening approach that temporarily replaces a human screener with a semi-automated screening tool. Methods We evaluated the accuracy of the approach using DistillerAI as a semi-automated screening tool. A published comparative effectiveness review served as the reference standard. Five teams of professional systematic reviewers screened the same 2472 abstracts in parallel. Each team trained DistillerAI with 300 randomly selected abstracts that the team screened dually. For all remaining abstracts, DistillerAI replaced one human screener and provided predictions about the relevance of records. A single reviewer also screened all remaining abstracts. A second human screener resolved conflicts between the single reviewer and DistillerAI. We compared the decisions of the machine-assisted approach, single-reviewer screening, and screening with DistillerAI alone against the reference standard. Results The combined sensitivity of the machine-assisted screening approach across the five screening teams was 78% (95% confidence interval [CI], 66 to 90%), and the combined specificity was 95% (95% CI, 92 to 97%). By comparison, the sensitivity of single-reviewer screening was similar (78%; 95% CI, 66 to 89%); however, the sensitivity of DistillerAI alone was substantially worse (14%; 95% CI, 0 to 31%) than that of the machine-assisted screening approach. Specificities for single-reviewer screening and DistillerAI were 94% (95% CI, 91 to 97%) and 98% (95% CI, 97 to 100%), respectively. Machine-assisted screening and single-reviewer screening had similar areas under the curve (0.87 and 0.86, respectively); by contrast, the area under the curve for DistillerAI alone was just slightly better than chance (0.56). The interrater agreement between human screeners and DistillerAI with a prevalence-adjusted kappa was 0.85 (95% CI, 0.84 to 0.86%). Conclusions The accuracy of DistillerAI is not yet adequate to replace a human screener temporarily during abstract screening for systematic reviews. Rapid reviews, which do not require detecting the totality of the relevant evidence, may find semi-automation tools to have greater utility than traditional systematic reviews.
first_indexed 2024-12-12T08:53:26Z
format Article
id doaj.art-946b15cf9fd34e1d8a63f9e6fd1b0d93
institution Directory Open Access Journal
issn 2046-4053
language English
last_indexed 2024-12-12T08:53:26Z
publishDate 2019-11-01
publisher BMC
record_format Article
series Systematic Reviews
spelling doaj.art-946b15cf9fd34e1d8a63f9e6fd1b0d932022-12-22T00:30:05ZengBMCSystematic Reviews2046-40532019-11-018111010.1186/s13643-019-1221-3Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user studyGerald Gartlehner0Gernot Wagner1Linda Lux2Lisa Affengruber3Andreea Dobrescu4Angela Kaminski-Hartenthaler5Meera Viswanathan6RTI International–University of North Carolina Evidence-based Practice CenterDepartment for Evidence-based Medicine and Evaluation, Danube University KremsRTI International–University of North Carolina Evidence-based Practice CenterDepartment for Evidence-based Medicine and Evaluation, Danube University KremsDepartment for Evidence-based Medicine and Evaluation, Danube University KremsDepartment for Evidence-based Medicine and Evaluation, Danube University KremsRTI International–University of North Carolina Evidence-based Practice CenterAbstract Background Web applications that employ natural language processing technologies to support systematic reviewers during abstract screening have become more common. The goal of our project was to conduct a case study to explore a screening approach that temporarily replaces a human screener with a semi-automated screening tool. Methods We evaluated the accuracy of the approach using DistillerAI as a semi-automated screening tool. A published comparative effectiveness review served as the reference standard. Five teams of professional systematic reviewers screened the same 2472 abstracts in parallel. Each team trained DistillerAI with 300 randomly selected abstracts that the team screened dually. For all remaining abstracts, DistillerAI replaced one human screener and provided predictions about the relevance of records. A single reviewer also screened all remaining abstracts. A second human screener resolved conflicts between the single reviewer and DistillerAI. We compared the decisions of the machine-assisted approach, single-reviewer screening, and screening with DistillerAI alone against the reference standard. Results The combined sensitivity of the machine-assisted screening approach across the five screening teams was 78% (95% confidence interval [CI], 66 to 90%), and the combined specificity was 95% (95% CI, 92 to 97%). By comparison, the sensitivity of single-reviewer screening was similar (78%; 95% CI, 66 to 89%); however, the sensitivity of DistillerAI alone was substantially worse (14%; 95% CI, 0 to 31%) than that of the machine-assisted screening approach. Specificities for single-reviewer screening and DistillerAI were 94% (95% CI, 91 to 97%) and 98% (95% CI, 97 to 100%), respectively. Machine-assisted screening and single-reviewer screening had similar areas under the curve (0.87 and 0.86, respectively); by contrast, the area under the curve for DistillerAI alone was just slightly better than chance (0.56). The interrater agreement between human screeners and DistillerAI with a prevalence-adjusted kappa was 0.85 (95% CI, 0.84 to 0.86%). Conclusions The accuracy of DistillerAI is not yet adequate to replace a human screener temporarily during abstract screening for systematic reviews. Rapid reviews, which do not require detecting the totality of the relevant evidence, may find semi-automation tools to have greater utility than traditional systematic reviews.http://link.springer.com/article/10.1186/s13643-019-1221-3Systematic reviewsMachine-learningRapid reviewsAccuracyMethods study
spellingShingle Gerald Gartlehner
Gernot Wagner
Linda Lux
Lisa Affengruber
Andreea Dobrescu
Angela Kaminski-Hartenthaler
Meera Viswanathan
Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study
Systematic Reviews
Systematic reviews
Machine-learning
Rapid reviews
Accuracy
Methods study
title Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study
title_full Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study
title_fullStr Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study
title_full_unstemmed Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study
title_short Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study
title_sort assessing the accuracy of machine assisted abstract screening with distillerai a user study
topic Systematic reviews
Machine-learning
Rapid reviews
Accuracy
Methods study
url http://link.springer.com/article/10.1186/s13643-019-1221-3
work_keys_str_mv AT geraldgartlehner assessingtheaccuracyofmachineassistedabstractscreeningwithdistilleraiauserstudy
AT gernotwagner assessingtheaccuracyofmachineassistedabstractscreeningwithdistilleraiauserstudy
AT lindalux assessingtheaccuracyofmachineassistedabstractscreeningwithdistilleraiauserstudy
AT lisaaffengruber assessingtheaccuracyofmachineassistedabstractscreeningwithdistilleraiauserstudy
AT andreeadobrescu assessingtheaccuracyofmachineassistedabstractscreeningwithdistilleraiauserstudy
AT angelakaminskihartenthaler assessingtheaccuracyofmachineassistedabstractscreeningwithdistilleraiauserstudy
AT meeraviswanathan assessingtheaccuracyofmachineassistedabstractscreeningwithdistilleraiauserstudy