An encoding technique based on word importance for the clustering of web documents

被引:0
|
作者
Zakos, J [1 ]
Verma, B [1 ]
机构
[1] Griffith Univ, Sch Informat Technol, Gold Coast, Qld 9726, Australia
来源
ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING: COMPUTATIONAL INTELLIGENCE FOR THE E-AGE | 2002年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a word encoding and clustering technique that groups web documents based on the importance of the words that appear in the documents. We use a two level self-organizing map architecture to generate clusters of words and documents. We propose that by capturing word importance information of words, similar documents can be then clustered to assist in web document retrieval. A web document retrieval system is presented to demonstrate how this approach could be integrated into web search.
引用
收藏
页码:2207 / 2211
页数:5
相关论文
共 50 条
  • [21] Fast fuzzy clustering of Web documents
    Wang, Jian-Hui
    Jiang, Long-Bin
    Yang, Shu
    Chang'an Daxue Xuebao (Ziran Kexue Ban)/Journal of Chang'an University (Natural Science Edition), 2007, 27 (02): : 107 - 110
  • [22] Clustering of Short Commercial Documents for the Web
    Carullo, Moreno
    Binaghi, Elisabetta
    Gallo, Ignazio
    Lamberti, Nicola
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 1873 - +
  • [23] Detecting Topics in Documents by Clustering Word Vectors
    de Miranda, Guilherme Raiol
    Pasti, Rodrigo
    de Castro, Leandro Nunes
    DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, 16TH INTERNATIONAL CONFERENCE, 2020, 1003 : 235 - 243
  • [24] A weighted common structure based clustering technique for XML documents
    Hwang, Jeong Hee
    Ryu, Keun Ho
    JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (07) : 1267 - 1274
  • [25] A novel weighted phrase-based similarity for Web documents clustering
    Yang R.
    Zhu Q.
    Xia Y.
    Journal of Software, 2011, 6 (08) : 1521 - 1528
  • [26] TVS Based Technique for Efficient Web Document Clustering in Web Search
    Rajasekaran, R. Thalapathi
    Ramesh, R.
    Menaka, R.
    Vanisri, A.
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (03): : 68 - 73
  • [27] Acyclic Word Graph for Web Clustering
    Moghrabi, Issam A. R.
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2012, VOL I, 2012, : 477 - 481
  • [28] Clustering Retrieved Web Documents to Speed Up Web Searches
    Qumsiyeh, Rani
    Ng, Yiu-Kai
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2015, PT I, 2015, 9155 : 472 - 488
  • [29] Social Web Videos Clustering Based on Ensemble Technique
    Mekthanavanh, Vinath
    Li, Tianrui
    ROUGH SETS, (IJCRS 2016), 2016, 9920 : 449 - 458
  • [30] Clustering Web Documents with Tables for Information Extraction
    Shchekotykhin, Kostyantyn
    Jannach, Dietmar
    Friedrich, Gerhard
    K-CAP'07: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE, 2007, : 169 - 170