A new document representation using term frequency and vectorized graph connectionists with application to document retrieval

被引:31
|
作者
Chow, Tommy W. S. [1 ]
Zhang, Haijun [1 ]
Rahman, M. K. M. [1 ]
机构
[1] City Univ Hong Kong, Dept Elect Engn, Kowloon, Hong Kong, Peoples R China
关键词
Graph representation; Multiple features; Document retrieval; Self-organizing map;
D O I
10.1016/j.eswa.2009.03.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a new document representation with vectorized multiple features including term frequency and term-connection-frequency. A document is represented by undirected and directed graph, respectively. Then terms and vectorized graph connectionists are extracted from the graphs by employing several feature extraction methods. This hybrid document feature representation more accurately reflects the underlying semantics that are difficult to achieve from the currently used term histograms, and it facilitates the matching of complex graph. in application level, we develop a document retrieval system based on self-organizing map (SOM) to speed up the retrieval process. We perform extensive experimental verification, and the results suggest that the proposed method is computationally efficient and accurate for document retrieval. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:12023 / 12035
页数:13
相关论文
共 50 条
  • [31] Impact of Document Representation on Neural Ad hoc Retrieval
    Bagheri, Ebrahim
    Ensan, Faezeh
    Al-Obeidat, Feras
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1635 - 1638
  • [32] Uniform Representation of Content and Structure for structured document retrieval
    Lalmas, M
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XVII, 2001, : 215 - 228
  • [33] New Approaches to Spoken Document Retrieval
    Martin Wechsler
    Eugen Munteanu
    Peter Schäuble
    Information Retrieval, 2000, 3 : 173 - 188
  • [34] New approaches to spoken document retrieval
    Wechsler, M
    Munteanu, E
    Schäuble, P
    INFORMATION RETRIEVAL, 2000, 3 (03): : 173 - 188
  • [35] Contrastive Document Representation Learning with Graph Attention Networks
    Xu, Peng
    Chen, Xinchi
    Ma, Xiaofei
    Huang, Zhiheng
    Xiang, Bing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3874 - 3884
  • [36] Key-region Detection for Document Images - Application to Administrative Document Retrieval
    Gao, Hongxing
    Rusinol, Marcal
    Karatzas, Dimosthenis
    Llados, Josep
    Sato, Tomokazu
    Iwamura, Masakazu
    Kise, Koichi
    2013 12TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2013, : 230 - 234
  • [37] Graph-based Document Representation for Relation Extraction
    Cabaleiro, Bernardo
    Penas, Anselmo
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2012, (49): : 57 - 64
  • [38] Keyphrase Graph in Text Representation for Document Similarity Measurement
    ThanhThuong T Huynh
    TruongAn Phamnguyen
    Nhon V Do
    KNOWLEDGE INNOVATION THROUGH INTELLIGENT SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES (SOMET_20), 2020, 327 : 459 - 472
  • [39] CGTR: Convolution Graph Topology Representation for Document Ranking
    Qi, Yuanyuan
    Zhang, Jiayue
    Liu, Yansong
    Xu, Weiran
    Guo, Jun
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2173 - 2176
  • [40] High Performance in Minimizing of Term-Document Matrix Representation for Document Clustering
    Muflikhah, L.
    Baharudin, B.
    2009 CONFERENCE ON INNOVATIVE TECHNOLOGIES IN INTELLIGENT SYSTEMS AND INDUSTRIAL APPLICATIONS, 2009, : 225 - 229