Semantic Search based on the Online Integration of NLP Techniques

被引:7
作者
Masuda, Katsuya [1 ]
Matsuzaki, Takuya [2 ]
Tsujii, Jun'ichi [3 ]
机构
[1] Univ Tokyo, Ctr Knowledge Struct, Bunkyo Ku, Tokyo 1138656, Japan
[2] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Comp Sci, Tokyo 1138656, Japan
[3] Microsoft Res Asia, Beijing 100080, Peoples R China
来源
COMPUTATIONAL LINGUISTICS AND RELATED FIELDS | 2011年 / 27卷
关键词
Information Retrieval; Semantic Search; Tag Annotations; TEXT SEARCH; ALGEBRA;
D O I
10.1016/j.sbspro.2011.10.609
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This paper introduces a framework for semantic information retrieval based on the integration of various natural language processing (NLP) techniques, each of which annotates a base text with different kinds of information extracted from the text. Instead of running the NLP modules on the fly for individual search requests, the NLP modules are applied to the text in advance and the results are indexed in a way that enables flexible and efficient integration of them. The query language is based on a variant of the region algebra, in which we can specify a substructure in the annotated text that may involve different kinds of annotations. Given a query, the retrieval engine searches for the sub-structure by aggregating the different kinds of annotations through a search algorithm for the extended region algebra. We demonstrate the effectiveness and flexibility of the proposed framework through experiments with TREC Genomics Track data. (C) 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of PACLING Organizing Committee.
引用
收藏
页码:281 / 290
页数:10
相关论文
共 14 条
[1]  
ALINK W, 2006, P 5 WORKSH INLP XML, P3
[2]   Automatic recognition of topic-classified relations between prostate cancer and genes using MEDLINE abstracts [J].
Chun, Hong-Woo ;
Tsuruoka, Yoshimasa ;
Kim, Jin-Dong ;
Shiba, Rie ;
Nagata, Naoki ;
Hishiki, Teruyoshi ;
Tsujii, Jun'ichi .
BMC BIOINFORMATICS, 2006, 7 (Suppl 3)
[3]   AN ALGEBRA FOR STRUCTURED TEXT SEARCH AND A FRAMEWORK FOR ITS IMPLEMENTATION [J].
CLARKE, CLA ;
CORMACK, GV ;
BURKOWSKI, FJ .
COMPUTER JOURNAL, 1995, 38 (01) :43-56
[4]   Building an example application with the Unstructured Information Management Architecture [J].
Ferrucci, D ;
Lally, A .
IBM SYSTEMS JOURNAL, 2004, 43 (03) :455-475
[5]  
Hersh W., 2005, TREC 2005 Notebook, V500-266, P14
[6]  
Hirohata K., 2008, P 3 INT JOINT C NAT, P381
[7]  
Kim JH, 2003, ELEC SOC S, V2003, P1
[8]   Corpus annotation for mining biomedical events from literature [J].
Kim, Jin-Dong ;
Ohta, Tomoko ;
Tsujii, Jun'ichi .
BMC BIOINFORMATICS, 2008, 9 (1)
[9]   Tag-Annotated Text Search Using Extended Region Algebra [J].
Masuda, Katsuya ;
Tsujii, Jun'ichi .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (12) :2369-2377
[10]  
MIMA H, 2002, P COL 2002, P667