Content-Structure Correspondence: A Generic Representation for Heterogeneous Structured Document

被引:0
作者
Tan, Saravadee Sae [1 ]
Tang, Enya Kong [1 ]
Ranaivo-Malancon, Bali [1 ]
机构
[1] Multimedia Univ, Fac Informat Technol, Selangor, Malaysia
来源
COMPUTATIONAL LINGUISTICS AND RELATED FIELDS | 2011年 / 27卷
关键词
Parsing; Subcategorization; PP attachment; Coordination attachment; Text understanding; Grammar writing;
D O I
10.1016/j.sbspro.2011.10.602
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
This on the web, most structured document collections consist of documents from different sources and marked up with different types of structures. The diversity of structures has lead to the emergence of heterogeneous structured documents. The heterogeneity of structured documents poses new challenges for document representation in structured document retrieval. The representation model needs to handle various types of structures as well as multiple structures in a single document. Furthermore, same information may be represented in different structures and information contained in different documents may be partial and inconsistent. Therefore, the linkage of semantically related elements in the document collections needs to be modelled in the representation model. In this paper, we introduce a generic and flexible structured document model to represent heterogeneous structured documents as well as the similar correspondences in the document collections. (C) 2011 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of PACLING Organizing Committee.
引用
收藏
页码:226 / 232
页数:7
相关论文
共 6 条
  • [1] [Anonymous], 2008, Introduction to information retrieval
  • [2] Studying the XML Web: Gathering statistics from an XML sample (vol 8, pg 413, 2006)
    Barbosa, Denilson
    Mignet, Laurent
    Veltri, Pierangelo
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2006, 9 (02): : 187 - 212
  • [3] Denoyer L., 2006, SIGIR Forum, V40, P64, DOI 10.1145/1147197.1147210
  • [4] Gabriella K., 2008, PROCEDIA SOCIAL BEHA, P106
  • [5] Schenkel R., 2008, P DAT BUS TECHN WEB, P277
  • [6] van Zwol R., 2007, P 29 EUR C IR RES EC, P621