Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study

Gerald Gartlehner; Gernot Wagner; Linda Lux; Lisa Affengruber; Andreea Dobrescu; Angela Kaminski-Hartenthaler; Meera Viswanathan

doi:10.1186/s13643-019-1221-3

Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study

Gerald Gartlehner, Gernot Wagner, Linda Lux, Lisa Affengruber, Andreea Dobrescu, Angela Kaminski-Hartenthaler, Meera Viswanathan

Source

Systematic Reviews > 2019 > 8 > 1 > 1-10

Abstract

Background

Web applications that employ natural language processing technologies to support systematic reviewers during abstract screening have become more common. The goal of our project was to conduct a case study to explore a screening approach that temporarily replaces a human screener with a semi-automated screening tool.

Methods

We evaluated the accuracy of the approach using DistillerAI as a semi-automated screening tool. A published comparative effectiveness review served as the reference standard. Five teams of professional systematic reviewers screened the same 2472 abstracts in parallel. Each team trained DistillerAI with 300 randomly selected abstracts that the team screened dually. For all remaining abstracts, DistillerAI replaced one human screener and provided predictions about the relevance of records. A single reviewer also screened all remaining abstracts. A second human screener resolved conflicts between the single reviewer and DistillerAI. We compared the decisions of the machine-assisted approach, single-reviewer screening, and screening with DistillerAI alone against the reference standard.

Results

The combined sensitivity of the machine-assisted screening approach across the five screening teams was 78% (95% confidence interval [CI], 66 to 90%), and the combined specificity was 95% (95% CI, 92 to 97%). By comparison, the sensitivity of single-reviewer screening was similar (78%; 95% CI, 66 to 89%); however, the sensitivity of DistillerAI alone was substantially worse (14%; 95% CI, 0 to 31%) than that of the machine-assisted screening approach. Specificities for single-reviewer screening and DistillerAI were 94% (95% CI, 91 to 97%) and 98% (95% CI, 97 to 100%), respectively. Machine-assisted screening and single-reviewer screening had similar areas under the curve (0.87 and 0.86, respectively); by contrast, the area under the curve for DistillerAI alone was just slightly better than chance (0.56). The interrater agreement between human screeners and DistillerAI with a prevalence-adjusted kappa was 0.85 (95% CI, 0.84 to 0.86%).

Conclusions

The accuracy of DistillerAI is not yet adequate to replace a human screener temporarily during abstract screening for systematic reviews. Rapid reviews, which do not require detecting the totality of the relevant evidence, may find semi-automation tools to have greater utility than traditional systematic reviews.

Identifiers

journal e-ISSN :	2046-4053
DOI	10.1186/s13643-019-1221-3

Authors

Gerald Gartlehner

RTI International–University of North Carolina Evidence-based Practice Center, Research Triangle Park, USA
Danube University Krems, Department for Evidence-based Medicine and Evaluation, Krems, Austria

Gernot Wagner

Danube University Krems, Department for Evidence-based Medicine and Evaluation, Krems, Austria

Linda Lux

RTI International–University of North Carolina Evidence-based Practice Center, Research Triangle Park, USA

Lisa Affengruber

Danube University Krems, Department for Evidence-based Medicine and Evaluation, Krems, Austria
Maastricht University, Department of Family Medicine, Care and Public Health Research Institute (CAPHRI), Maastricht, The Netherlands

see all

Keywords

Systematic reviews Machine-learning Rapid reviews Accuracy Methods study

Additional information

Publication languages: English

Data set: Springer

Publisher

BioMed Central

Fields of science

No field of science has been suggested yet.

article

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Gerald Gartlehner

Gernot Wagner

Linda Lux

Lisa Affengruber

Keywords

Additional information

Publisher

Fields of science

Fields of science

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Assessing the accuracy of machine-assisted abstract screening with DistillerAI: a user study