A new document representation using term frequency and vectorized graph connectionists with application to document retrieval

被引:31
|
作者
Chow, Tommy W. S. [1 ]
Zhang, Haijun [1 ]
Rahman, M. K. M. [1 ]
机构
[1] City Univ Hong Kong, Dept Elect Engn, Kowloon, Hong Kong, Peoples R China
关键词
Graph representation; Multiple features; Document retrieval; Self-organizing map;
D O I
10.1016/j.eswa.2009.03.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a new document representation with vectorized multiple features including term frequency and term-connection-frequency. A document is represented by undirected and directed graph, respectively. Then terms and vectorized graph connectionists are extracted from the graphs by employing several feature extraction methods. This hybrid document feature representation more accurately reflects the underlying semantics that are difficult to achieve from the currently used term histograms, and it facilitates the matching of complex graph. in application level, we develop a document retrieval system based on self-organizing map (SOM) to speed up the retrieval process. We perform extensive experimental verification, and the results suggest that the proposed method is computationally efficient and accurate for document retrieval. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:12023 / 12035
页数:13
相关论文
共 50 条
  • [1] A New Term Weighting Scheme Based on Class Specific Document Frequency for Document Representation and Classification
    Plansangket, Suthira
    Gan, John Q.
    2015 7TH COMPUTER SCIENCE AND ELECTRONIC ENGINEERING CONFERENCE (CEEC), 2015, : 5 - 8
  • [2] Graph-Enhanced Document Representation for Court Case Retrieval
    Fink, Tobias
    ADVANCES IN INFORMATION RETRIEVAL, PT II, 2022, 13186 : 480 - 487
  • [3] Term frequency - function of document frequency: a new term weighting scheme for enterprise information retrieval
    Zhang, Hui
    Wang, Deqing
    Wu, Wenjun
    Hu, Hongping
    ENTERPRISE INFORMATION SYSTEMS, 2012, 6 (04) : 433 - 444
  • [4] Document retrieval using projection by frequency distribution
    Oh'uchi, H
    Miura, T
    Shioya, I
    ICTAI 2005: 17TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, : 356 - 361
  • [5] Using rich document representation in XML information retrieval
    Raja, Fahimeh
    Keikha, Mostafa
    Rahgozar, Masued
    Oroumchian, Farhad
    COMPARATIVE EVALUATION OF XML INFORMATION RETRIEVAL SYSTEMS, 2007, 4518 : 294 - 301
  • [6] A Graph Based Document Retrieval Method
    Zhang, Zhiqiang
    Wang, Linan
    Xie, Xiaoqin
    Pan, Haiwei
    PROCEEDINGS OF THE 2018 IEEE 22ND INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN ((CSCWD)), 2018, : 426 - 432
  • [7] Amharic Document Representation for Adhoc Retrieval
    Yeshambel, Tilahun
    Mothe, Josiane
    Assabie, Yaregal
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1, 2020, : 124 - 134
  • [8] Graph Representation Learning in Document Wikification
    Saeidi, Mozhgan
    Milios, Evangelos
    Zeh, Norbert
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II, 2021, 12917 : 509 - 524
  • [9] A new hierarchical conceptual graph formalism adapted for Chinese document retrieval
    Hu, Yi
    Lu, Ruzhan
    Chen, Yuquan
    Liu, Hui
    FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2007, : 653 - 657
  • [10] A "stereo" document representation for textual information retrieval
    Chen, L
    Zeng, J
    Tokuda, N
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (06): : 768 - 774