Queries related to COVID-19: a more effective retrieval through finetuned ALBERT with BM25L question answering system

被引:4
作者
Godavarthi, Deepthi [1 ]
Sowjanya, A. Mary [1 ]
机构
[1] Andhra Univ, Dept Comp Sci & Syst Engn, Coll Engn A, Visakhapatnam, Andhra Pradesh, India
关键词
ALBERT; Answer extraction; BM25; BM25L; CORD-19; Context retrieval; COVID-19; Natural language processing; Question answering; SQuAD;
D O I
10.1108/WJE-01-2021-0059
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Purpose The purpose of this paper is to build a better question answering (QA) system that can furnish more improved retrieval of answers related to COVID-19 queries from the COVID-19 open research data set (CORD-19). As CORD-19 has an up-to-date collection of coronavirus literature, text mining approaches can be successfully used to retrieve answers pertaining to all coronavirus-related questions. The existing a lite BERT for self-supervised learning of language representations (ALBERT) model is finetuned for retrieving all COVID relevant information to scientific questions posed by the medical community and to highlight the context related to the COVID-19 query. Design/methodology/approach This study presents a finetuned ALBERT-based QA system in association with Best Match25 (Okapi BM25) ranking function and its variant BM25L for context retrieval and provided high scores in benchmark data sets such as SQuAD for answers related to COVID-19 questions. In this context, this paper has built a QA system, pre-trained on SQuAD and finetuned it on CORD-19 data to retrieve answers related to COVID-19 questions by extracting semantically relevant information related to the question. Findings BM25L is found to be more effective in retrieval compared to Okapi BM25. Hence, finetuned ALBERT when extended to the CORD-19 data set provided accurate results. Originality/value The finetuned ALBERT QA system was developed and tested for the first time on the CORD-19 data set to extract context and highlight the span of the answer for more clarity to the user.
引用
收藏
页码:109 / 113
页数:5
相关论文
共 11 条
[1]   COVID-19 Outbreak and Surgical Practice Unexpected Fatality in Perioperative Period [J].
Aminian, Ali ;
Safari, Saeed ;
Razeghian-Jahromi, Abdolali ;
Ghorbani, Mohammad ;
Delaney, Conor P. .
ANNALS OF SURGERY, 2020, 272 (01) :E27-E29
[2]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[3]  
Fung, 2020, CAIRE COVID QUESTION
[4]  
Lan Z., 2020, P 8 INT C LEARN REPR, P1
[5]  
Lee J, 2020, Answering questions on COVID
[6]  
Lin, 2020, WHICH BM25 YOU MEAN, V3
[7]  
Lv YH, 2011, PROCEEDINGS OF THE 34TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR'11), P1103
[8]   A Methodology for Open Information Extraction and Representation from Large Scientific Corpora: The CORD-19 Data Exploration Use Case [J].
Papadopoulos, Dimitris ;
Papadakis, Nikolaos ;
Litke, Antonis .
APPLIED SCIENCES-BASEL, 2020, 10 (16)
[9]  
Ruas P., COVID 19 SEMANTIC BA
[10]  
Wang J., 2020, COVID 19 SIGNSYM FAS