A new document representation using term frequency and vectorized graph connectionists with application to document retrieval

被引:31
|
作者
Chow, Tommy W. S. [1 ]
Zhang, Haijun [1 ]
Rahman, M. K. M. [1 ]
机构
[1] City Univ Hong Kong, Dept Elect Engn, Kowloon, Hong Kong, Peoples R China
关键词
Graph representation; Multiple features; Document retrieval; Self-organizing map;
D O I
10.1016/j.eswa.2009.03.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a new document representation with vectorized multiple features including term frequency and term-connection-frequency. A document is represented by undirected and directed graph, respectively. Then terms and vectorized graph connectionists are extracted from the graphs by employing several feature extraction methods. This hybrid document feature representation more accurately reflects the underlying semantics that are difficult to achieve from the currently used term histograms, and it facilitates the matching of complex graph. in application level, we develop a document retrieval system based on self-organizing map (SOM) to speed up the retrieval process. We perform extensive experimental verification, and the results suggest that the proposed method is computationally efficient and accurate for document retrieval. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:12023 / 12035
页数:13
相关论文
共 50 条
  • [41] A probabilistic information retrieval model by document ranking using term dependencies
    You, Hyun-Jo
    Lee, Jung-Jin
    KOREAN JOURNAL OF APPLIED STATISTICS, 2019, 32 (05) : 763 - 782
  • [42] Information visualization and retrieval using stereoscopic display of document and term relations
    Fowler, RH
    Fowler, WAL
    STEREOSCOPIC DISPLAYS AND VIRTUAL REALITY SYSTEMS V, 1998, 3295 : 148 - 155
  • [43] Legal Document Retrieval Using Document Vector Embeddings and Deep Learning
    Sugathadasa, Keet
    Ayesha, Buddhi
    de Silva, Nisansa
    Perera, Amal Shehan
    Jayawardana, Vindula
    Lakmal, Dimuthu
    Perera, Madhavi
    INTELLIGENT COMPUTING, VOL 2, 2019, 857 : 160 - 175
  • [44] A Graph-Structure-Based Method for Chinese Document Representation towards Clustering Application
    Liu, Qiaofeng
    Wu, Jiangning
    Wang, Yonggui
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 5006 - 5009
  • [45] Document-to-Document Retrieval Using Self-Retrieval Learning and Automatic Keyword Extraction
    Seki, Yasuaki
    Hamagami, Tomoki
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2025, 20 (01) : 69 - 76
  • [46] Contextually Propagated Term Weights for Document Representation
    Hansen, Casper
    Hansen, Christian
    Alstrup, Stephen
    Simonsen, Jakob Grue
    Lioma, Christina
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 897 - 900
  • [47] Spoken document summarization and retrieval for wireless application
    Wu, CH
    Huang, CL
    Hsieh, CH
    2005 INTERNATIONAL CONFERENCE ON WIRELESS NETWORKS, COMMUNICATIONS AND MOBILE COMPUTING, VOLS 1 AND 2, 2005, : 1388 - 1393
  • [48] A model for extracting keywords of document using term frequency and distribution
    Lee, JW
    Baik, DK
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2004, 2945 : 437 - 440
  • [49] Document Categorization Using Graph Structuring
    Sarma, Sandipan
    Saha, Punyajoy
    Sil, Jaya
    ADVANCED COMPUTATIONAL AND COMMUNICATION PARADIGMS, VOL 2, 2018, 706 : 483 - 491
  • [50] Document Classification Using Ontology Graph
    Tellioglu, Abdullah
    Rahmet, Faruk
    Diri, Banu
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 385 - 388