Wikipedia-based query phrase expansion in patent class search

被引:23
|
作者
Al-Shboul, Bashar [1 ]
Myaeng, Sung-Hyon [2 ,3 ]
机构
[1] Univ Jordan, Dept Business Informat Technol, Amman 11942, Jordan
[2] Korea Adv Inst Sci & Technol, Dept Comp Sci, Taejon 305701, South Korea
[3] Korea Adv Inst Sci & Technol, Div Web Sci, Taejon 305701, South Korea
来源
INFORMATION RETRIEVAL | 2014年 / 17卷 / 5-6期
基金
新加坡国家研究基金会;
关键词
Patent search; Phrase-based query expansion; Wikipedia categories; Clarity; Retrievability; INFORMATION-RETRIEVAL; LEXICAL COHESION; TERMS;
D O I
10.1007/s10791-013-9233-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Relevance feedback methods generally suffer from topic drift caused by word ambiguities and synonymous uses of words. Topic drift is an important issue in patent information retrieval as people tend to use different expressions describing similar concepts causing low precision and recall at the same time. Furthermore, failing to retrieve relevant patents to an application during the examination process may cause legal problems caused by granting an existing invention. A possible cause of topic drift is utilizing a relevance feedback-based search method. As a way to alleviate the inherent problem, we propose a novel query phrase expansion approach utilizing semantic annotations in Wikipedia pages, trying to enrich queries with phrases disambiguating the original query words. The idea was implemented for patent search where patents are classified into a hierarchy of categories, and the analyses of the experimental results showed not only the positive roles of phrases and words in retrieving additional relevant documents through query expansion but also their contributions to alleviating the query drift problem. More specifically, our query expansion method was compared against relevance-based language model, a state-of-the-art query expansion method, to show its superiority in terms of MAP on all levels of the classification hierarchy.
引用
收藏
页码:430 / 451
页数:22
相关论文
共 50 条
  • [1] Wikipedia-based query phrase expansion in patent class search
    Bashar Al-Shboul
    Sung-Hyon Myaeng
    Information Retrieval, 2014, 17 : 430 - 451
  • [2] Query Phrase Expansion Using Wikipedia in Patent Class Search
    Al-Shboul, Bashar
    Myaeng, Sung-Hyon
    INFORMATION RETRIEVAL TECHNOLOGY, 2011, 7097 : 115 - 126
  • [3] Wikipedia-Based Query Performance Prediction
    Katz, Gilad
    Shtok, Anna
    Kurland, Oren
    Shapira, Bracha
    Rokach, Lior
    SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 1235 - 1238
  • [4] Rare Query Expansion via Wikipedia for Sponsored Search
    Xu, Zhuoran
    Wang, Xiangzhi
    Yu, Yong
    KNOWLEDGE ENGINEERING AND MANAGEMENT, 2011, 123 : 521 - 530
  • [5] A Semantic Search Technique with Wikipedia-based Text Representation Model
    Hong, Ki-Joo
    Kim, Han-Joon
    2016 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2016, : 177 - 182
  • [6] Wikipedia-Based Document Categorization
    Ciesielski, Krzysztof
    Borkowski, Piotr
    Klopotek, Mieczyslaw A.
    Trojanowski, Krzysztof
    Wysocki, Kamil
    SECURITY AND INTELLIGENT INFORMATION SYSTEMS, 2012, 7053 : 265 - 278
  • [7] Applying Wikipedia-Based Explicit Semantic Analysis for Query-Biased Document Summarization
    Zhou, Yunqing
    Guo, Zhongqi
    Ren, Peng
    Yu, Yong
    ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, 2010, 6215 : 474 - 481
  • [8] Improving Question Answering based on Query Expansion with Wikipedia
    Miao, Yajie
    Su, Xin
    Li, Chunping
    22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 2, 2010, : 233 - 240
  • [9] Wikipedia-based Learning Path Generation
    Perez Martinez, Claudia
    Lopez Morteo, Gabriel
    Martinez Reyes, Magally
    Gelbukh, Alexander
    COMPUTACION Y SISTEMAS, 2015, 19 (03): : 589 - 600
  • [10] Query based Chinese phrase extraction for site search
    Xu, JF
    Ye, SZ
    Li, X
    WEB INFORMATION SYSTEMS - WISE 2004, PROCEEDINGS, 2004, 3306 : 125 - 134