Reducing systematic review burden using Deduklick: a novel, automated, reliable, and explainable deduplication algorithm to foster medical research

被引:53
作者
Borissov, Nikolay [1 ,2 ]
Haas, Quentin [1 ,2 ]
Minder, Beatrice [3 ]
Kopp-Heim, Doris [3 ]
von Gernler, Marc [4 ]
Janka, Heidrun [4 ]
Teodoro, Douglas [5 ,6 ]
Amini, Poorya [1 ,2 ]
机构
[1] Univ Bern, Risklick AG, Spin Off, Bern, Switzerland
[2] Univ Bern, CTU Bern, Bern, Switzerland
[3] Univ Bern, Univ Lib Bern, Publ Hlth & Primary Care Lib, Bern, Switzerland
[4] Univ Bern, Univ Lib Bern, Med Lib, Bern, Switzerland
[5] Univ Appl Sci & Arts Western Switzerland, Geneva, Switzerland
[6] Univ Geneva, Dept Radiol & Med Informat, Geneva, Switzerland
关键词
Artificial intelligence; Systematic review; Deduplication; Risklick; Bibliographic databases; Duplicate references; Systematic review software; TOOLS;
D O I
10.1186/s13643-022-02045-9
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background:Identifying and removing reference duplicates when conducting systematic reviews (SRs) remain a major, time-consuming issue for authors who manually check for duplicates using built-in features in citation managers. To address issues related to manual deduplication, we developed an automated, efficient, and rapid artificial intelligence-based algorithm named Deduklick. Deduklick combines natural language processing algorithms with a set of rules created by expert information specialists. Methods:Deduklick's deduplication uses a multistep algorithm of data normalization, calculates a similarity score, and identifies unique and duplicate references based on metadata fields, such as title, authors, journal, DOI, year, issue, volume, and page number range. We measured and compared Deduklick's capacity to accurately detect duplicates with the information specialists' standard, manual duplicate removal process using EndNote on eight existing heterogeneous datasets. Using a sensitivity analysis, we manually cross-compared the efficiency and noise of both methods. Discussion;Deduklick achieved average recall of 99.51%, average precision of 100.00%, and average F1 score of 99.75%. In contrast, the manual deduplication process achieved average recall of 88.65%, average precision of 99.95%, and average F1 score of 91.98%. Deduklick achieved equal to higher expert-level performance on duplicate removal. It also preserved high metadata quality and drastically reduced time spent on analysis. Deduklick represents an efficient, transparent, ergonomic, and time-saving solution for identifying and removing duplicates in SRs searches. Deduklick could therefore simplify SRs production and represent important advantages for scientists, including saving time, increasing accuracy, reducing costs, and contributing to quality SRs.
引用
收藏
页数:10
相关论文
共 20 条
[1]  
Bannach-Brown Alexandra, 2021, BMJ Open Sci, V5, pe100131, DOI 10.1136/bmjos-2020-100131
[2]   Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry [J].
Borah, Rohit ;
Brown, Andrew W. ;
Capers, Patrice L. ;
Kaiser, Kathryn A. .
BMJ OPEN, 2017, 7 (02)
[3]   Optimal database combinations for literature searches in systematic reviews: a prospective exploratory study [J].
Bramer, Wichor M. ;
Rethlefsen, Melissa L. ;
Kleijnen, Jos ;
Franco, Oscar H. .
SYSTEMATIC REVIEWS, 2017, 6
[4]   De-duplication of database search results for systematic reviews in EndNote [J].
Bramer, Wichor M. ;
Giustini, Dean ;
de Jonge, Gerdien B. ;
Holland, Leslie ;
Bekhuis, Tanja .
JOURNAL OF THE MEDICAL LIBRARY ASSOCIATION, 2016, 104 (03) :240-243
[5]   Reference checking for systematic reviews using Endnote [J].
Bremer, Wichor M. .
JOURNAL OF THE MEDICAL LIBRARY ASSOCIATION, 2018, 106 (04) :542-546
[6]   A full systematic review was completed in 2 weeks using automation tools: a case study [J].
Clark, Justin ;
Glasziou, Paul ;
Del Mar, Chris ;
Bannach-Brown, Alexandra ;
Stehlik, Paulina ;
Scott, Anna Mae .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2020, 121 :81-90
[7]  
Cohen IG, 2020, LANCET DIGIT HEALTH, V2, pE376, DOI 10.1016/S2589-7500(20)30112-6
[8]   Users and citation management tools: use and support [J].
Emanuel, Jenny .
REFERENCE SERVICES REVIEW, 2013, 41 (04) :639-+
[9]   A typology of reviews: an analysis of 14 review types and associated methodologies [J].
Grant, Maria J. ;
Booth, Andrew .
HEALTH INFORMATION AND LIBRARIES JOURNAL, 2009, 26 (02) :91-108
[10]  
Hair K., 2021, Bioinformatics, DOI [10.1101/2021.05.04.442412, DOI 10.1101/2021.05.04.442412]