Hybrid query expansion using lexical resources and word embeddings for sentence retrieval in question answering

被引:96
作者
Esposito, Massimo [1 ]
Damiano, Ernanuele [1 ]
Minutolo, Aniello [1 ]
De Pietro, Giuseppe [1 ]
Fujita, Hamido [2 ]
机构
[1] Natl Res Council Italy, Inst High Performance Comp & Networking ICAR, Naples, Italy
[2] Iwate Prefecture Univ, Takizawa, Iwate, Japan
关键词
Query expansion; Question-answering; Information retrieval; Lexical resources; Word embeddings; Sentence retrieval;
D O I
10.1016/j.ins.2019.12.002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Question Answering (QA) systems based on Information Retrieval return precise answers to natural language questions, extracting relevant sentences from document collections. However, questions and sentences cannot be aligned terminologically, generating errors in the sentence retrieval. In order to augment the effectiveness in retrieving relevant sentences from documents, this paper proposes a hybrid Query Expansion (QE) approach, based on lexical resources and word embeddings, for QA systems. In detail, synonyms and hypernyms of relevant terms occurring in the question are first extracted from MultiWordNet and, then, contextualized to the document collection used in the QA system. Finally, the resulting set is ranked and filtered on the basis of wording and sense of the question, by employing a semantic similarity metric built on the top of a Word2Vec model. This latter is locally trained on an extended corpus pertaining the same topic of the documents used in the QA system. This QE approach is implemented into an existing QA system and experimentally evaluated, with respect to different possible configurations and selected baselines, for the Italian language and in the Cultural Heritage domain, assessing its effectiveness in retrieving sentences containing proper answers to questions belonging to four different categories. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:88 / 105
页数:18
相关论文
共 53 条
[11]   A Survey of Automatic Query Expansion in Information Retrieval [J].
Carpineto, Claudio ;
Romano, Giovanni .
ACM COMPUTING SURVEYS, 2012, 44 (01)
[12]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[13]  
2-9
[14]  
Diaz F, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P367
[15]  
Du L, 2008, ISI 2008: 2008 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS, P70, DOI 10.1109/ISI.2008.4565032
[16]  
Fang WM, 2008, FPGA 2008: SIXTEENTH ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, P139
[17]   A Prospect-Guided global query expansion strategy using word embeddings [J].
Fernandez-Reyes, Francis C. ;
Hermosillo-Valadez, Jorge ;
Montes-y-Gomez, Manuel .
INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (01) :1-13
[18]  
GREFENSTETTE G, 1992, SIGIR 92 : PROCEEDINGS OF THE FIFTEENTH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P89
[19]   A novel Fuzzy-PSO term weighting automatic query expansion approach using combined semantic filtering [J].
Gupta, Yogesh ;
Saini, Ashish .
KNOWLEDGE-BASED SYSTEMS, 2017, 136 :97-120
[20]   Question answering in conversations: Query refinement using contextual and semantic information [J].
Habibi, Maryam ;
Mandabi, Parvaz ;
Popescu-Belis, Andrei .
DATA & KNOWLEDGE ENGINEERING, 2016, 106 :38-51