Searching structured documents

被引:18
作者
Trotman, A [1 ]
机构
[1] Univ Otago, Dept Comp Sci, Dunedin, New Zealand
[2] Natl Lib Med, Natl Ctr Biotechnol Informat, Bethesda, MD 20894 USA
关键词
structured information retrieval; indexing and searching; vector space; Boolean searching; SGML and XML;
D O I
10.1016/S0306-4573(03)00041-4
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Structured document interchange formats such as XML and SGML are ubiquitous, however, information retrieval systems supporting structured searching are not. Structured searching can result in increased precision. A search for the author "Smith" in an unstructured corpus of documents specializing in iron-working could have a lower precision than a structured search for "Smith as author" in the same corpus. Analysis of XML retrieval languages identifies additional functionality that must be supported including searching at, and broken across multiple nodes in the document tree. A data structure is developed to support structured document searching. Application of this structure to information retrieval is then demonstrated. Document ranking is examined and adapted specifically for structured searching. Published by Elsevier Ltd.
引用
收藏
页码:619 / 632
页数:14
相关论文
共 23 条
[1]  
BEITZEL SM, 2001, P 17 INT C ADV SCI T
[2]  
BRAY T, 1988, EXTENSIBLE MARKUP LA
[3]  
BURKOWSKI FJ, 1992, P 15 ANN INT ACM SIG, P112
[4]  
CHINENYANGA T, 2001, P 24 ANN INT C RES D, P163
[5]   An indexing model for structured documents to support queries on content, structure and attributes [J].
Dao, T .
IEEE INTERNATIONAL FORUM ON RESEARCH AND TECHNOLOGY ADVANCES IN DIGITAL LIBRARIES -ADL'98-, PROCEEDINGS, 1998, :88-97
[6]  
Deutsch A., 1998, XML QL QUERY LANGUAG
[7]  
FUHR N, 2002, P 11 ACM INT C INF K
[8]  
Harman D., 1992, INFORMATION RETRIEVA, P363
[9]  
*ISO, 1986, ISO88791986
[10]  
*ISO, 1987, ISO88241987