Wikipedia-based query phrase expansion in patent class search

被引:24
作者
Al-Shboul, Bashar [1 ]
Myaeng, Sung-Hyon [2 ,3 ]
机构
[1] Univ Jordan, Dept Business Informat Technol, Amman 11942, Jordan
[2] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon 305701, South Korea
[3] Korea Adv Inst Sci & Technol, Div Web Sci, Taejon 305701, South Korea
来源
INFORMATION RETRIEVAL | 2014年 / 17卷 / 5-6期
基金
新加坡国家研究基金会;
关键词
Patent search; Phrase-based query expansion; Wikipedia categories; Clarity; Retrievability; INFORMATION-RETRIEVAL; LEXICAL COHESION; TERMS;
D O I
10.1007/s10791-013-9233-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Relevance feedback methods generally suffer from topic drift caused by word ambiguities and synonymous uses of words. Topic drift is an important issue in patent information retrieval as people tend to use different expressions describing similar concepts causing low precision and recall at the same time. Furthermore, failing to retrieve relevant patents to an application during the examination process may cause legal problems caused by granting an existing invention. A possible cause of topic drift is utilizing a relevance feedback-based search method. As a way to alleviate the inherent problem, we propose a novel query phrase expansion approach utilizing semantic annotations in Wikipedia pages, trying to enrich queries with phrases disambiguating the original query words. The idea was implemented for patent search where patents are classified into a hierarchy of categories, and the analyses of the experimental results showed not only the positive roles of phrases and words in retrieving additional relevant documents through query expansion but also their contributions to alleviating the query drift problem. More specifically, our query expansion method was compared against relevance-based language model, a state-of-the-art query expansion method, to show its superiority in terms of MAP on all levels of the classification hierarchy.
引用
收藏
页码:430 / 451
页数:22
相关论文
共 47 条
[1]  
Al-Shboul B., 2010, P 8 NTCIR WORKSH M E, P331
[2]  
Al-Shboul B, 2011, LECT NOTES COMPUT SC, V7097, P115, DOI 10.1007/978-3-642-25631-8_11
[3]  
[Anonymous], 2008, P SIGIR 2008, DOI DOI 10.1145/1390334.1390377
[4]  
[Anonymous], P INT C REC ADV NAT
[5]   Phase-based information retrieval [J].
Arampatzis, AT ;
Tsoris, T ;
Koster, CHA ;
Van der Weide, TP .
INFORMATION PROCESSING & MANAGEMENT, 1998, 34 (06) :693-707
[6]  
Arguello J., 2008, P C AM ASS ART INT A
[7]  
Atkinson K.H., 2008, Proceeding of the 1st ACM workshop on Patent information retrieval-PaIR '08, P37, DOI [DOI 10.1145/1458572.1458582, 10.1145/1458572.1458582]
[8]  
AZZOPARDI L, 2008, P 17 ACM C INF KNOWL, P561
[9]  
Azzopardi L, 2010, SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, P775
[10]   Adapting information retrieval to query contexts [J].
Bai, Jing ;
Nie, Jian-Yun .
INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (06) :1901-1922