A passage retrieval method based on probabilistic information retrieval model and UMLS concepts in biomedical question answering

被引:36
作者
Sarrouti, Mourad [1 ]
Ouatik El Alaoui, Said [1 ]
机构
[1] Sidi Mohammed Ben Abdellah Univ, FSDM, Lab Comp Sci & Modeling, Fes, Morocco
关键词
Biomedical question answering system; Biomedical passage retieval; Probabilistic information retrieval model; Unified medical language system; Natural language processing; Biomedical informatics; COMPLEX CLINICAL QUESTIONS; QUERY EXPANSION; SYSTEM;
D O I
10.1016/j.jbi.2017.03.001
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Background and Objective: Passage retrieval, the identification of top-ranked passages that may contain the answer for a given biomedical question, is a crucial component for any biomedical question answering(QA) system. Passage retrieval in open-domain QA is a longstanding challenge widely studied over the last decades. However, it still requires further efforts in biomedical QA. In this paper, we present a new biomedical passage retrieval method based on Stanford CoreNLP sentence/passage length, probabilistic information retrieval (IR) model and UMLS concepts. Methods: In the proposed method, we first use our document retrieval system based on PubMed search engine and UMLS similarity to retrieve relevant documents to a given biomedical question. We then take the abstracts from the retrieved documents and use Stanford CoreNLP for sentence splitter to make a set of sentences, i.e., candidate passages. Using stemmed words and UMLS concepts as features for the BM25 model, we finally compute the similarity scores between the biomedical question and each of the candidate passages and keep the N top-ranked ones. Results: Experimental evaluations performed on large standard datasets, provided by the BioASQ challenge, show that the proposed method achieves good performances compared with the current stateof-the-art methods. The proposed method significantly outperforms the current state-of-the-art methods by an average of 6.84% in terms of mean average precision (MAP). Conclusion: We have proposed an efficient passage retrieval method which can be used to retrieve relevant passages in biomedical QA systems with high mean average precision. (C) 2017 Elsevier Inc. All rights reserved.
引用
收藏
页码:96 / 103
页数:8
相关论文
共 46 条
[1]  
[Anonymous], 2014, P 52 ANN M ASS COMP
[2]   An overview of MetaMap: historical perspective and recent advances [J].
Aronson, Alan R. ;
Lang, Francois-Michel .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (03) :229-236
[3]  
Aronson AR, 2001, J AM MED INFORM ASSN, P17
[4]   Biomedical question answering: A survey [J].
Athenikos, Sofia J. ;
Han, Hyoil .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2010, 99 (01) :1-24
[5]   Usability survey of biomedical question answering systems [J].
Bauer, Michael A. ;
Berleant, Daniel .
HUMAN GENOMICS, 2012, 6
[6]   MEANS: A medical question-answering system combining NLP techniques and semantic Web technologies [J].
Ben Abacha, Asma ;
Zweigenbaum, Pierre .
INFORMATION PROCESSING & MANAGEMENT, 2015, 51 (05) :570-594
[7]   The Unified Medical Language System (UMLS): integrating biomedical terminology [J].
Bodenreider, O .
NUCLEIC ACIDS RESEARCH, 2004, 32 :D267-D270
[8]   Extractive text summarization system to aid data extraction from full text in systematic review development [J].
Bui, Duy Duc An ;
Del Fiol, Guilherme ;
Hurdle, John F. ;
Jonnalagadda, Siddhartha .
JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 64 :265-272
[9]   Answering questions with an n-gram based passage retrieval engine [J].
Buscaldi, Davide ;
Rosso, Paolo ;
Manuel Gomez-Soriano, Jose ;
Sanchis, Emilio .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2010, 34 (02) :113-134
[10]   Automatically extracting information needs from complex clinical questions [J].
Cao, Yong-gang ;
Cimino, James J. ;
Ely, John ;
Yu, Hong .
JOURNAL OF BIOMEDICAL INFORMATICS, 2010, 43 (06) :962-971