A new approach for query expansion using Wikipedia and WordNet

被引:55
作者
Azad, Hiteshwar Kumar [1 ]
Deepak, Akshay [1 ]
机构
[1] Natl Inst Technol, Dept Comp Sci Engn, Dept Comp Sci & Engn, Patna, Bihar, India
关键词
Query expansion; Information retrieval; Wordnet; Wikipedia; RELEVANCE; MODEL;
D O I
10.1016/j.ins.2019.04.019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Query expansion (QE) is a well-known technique used to enhance the effectiveness of information retrieval. QE reformulates the initial query by adding similar terms that help in retrieving more relevant results. Several approaches have been proposed in literature producing quite favorable results, but they are not evenly favorable for all types of queries (individual and phrase queries). One of the main reasons for this is the use of the same kind of data sources and weighting scheme while expanding both the individual and the phrase query terms. As a result, the holistic relationship among the query terms is not well captured or scored. To address this issue, we have presented a new approach for QE using Wikipedia and WordNet as data sources. Specifically, Wikipedia gives rich expansion terms for phrase terms, while WordNet does the same for individual terms. We have also proposed novel weighting schemes for expansion terms: in-link score (for terms extracted from Wikipedia) and a tf-idf based scheme (for terms extracted from WordNet). In the proposed Wikipedia-WordNet-based QE technique (WWQE), we weigh the expansion terms twice: first, they are scored by the weighting scheme individually, and then, the weighting scheme scores the selected expansion terms concerning the entire query using correlation score. The proposed approach gains improvements of 24% on the MAP score and 48% on the GMAP score over unexpanded queries on the FIRE dataset. Experimental results achieve a significant improvement over individual expansion and other related state-of-the-art approaches. We also analyzed the effect on retrieval effectiveness of the proposed technique by varying the number of expansion terms. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:147 / 163
页数:17
相关论文
共 49 条
[1]   Wikipedia-based query phrase expansion in patent class search [J].
Al-Shboul, Bashar ;
Myaeng, Sung-Hyon .
INFORMATION RETRIEVAL, 2014, 17 (5-6) :430-451
[2]  
ALMasri Mohannad., 2013, Proc. sixth Int. Work. Exploit. Semant. Annot. Inf. Retr. - ESAIR13, P5, DOI [10.1145/2513204.2513209, DOI 10.1145/2513204.2513209]
[3]   Probabilistic models of information retrieval based on measuring the divergence from randomness [J].
Amati, G ;
Van Rijsbergen, CJ .
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2002, 20 (04) :357-389
[4]  
Amati Giambattista, 2008, TECHNICAL REPORT
[5]  
[Anonymous], 2015, ARXIV150905567
[6]  
[Anonymous], 1965, ESTIMATION PROBABILI
[7]  
[Anonymous], ARXIV170800247
[8]  
[Anonymous], 1971, SMART RETRIEVAL SYST
[9]  
BAI J., 2005, Proceedings of ACM CIKM 05, P688, DOI DOI 10.1145/1099554.1099725
[10]   A review of ontology based query expansion [J].
Bhogal, J. ;
Macfarlane, A. ;
Smith, P. .
INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (04) :866-886