An encoding technique based on word importance for the clustering of web documents

被引:0
|
作者
Zakos, J [1 ]
Verma, B [1 ]
机构
[1] Griffith Univ, Sch Informat Technol, Gold Coast, Qld 9726, Australia
来源
ICONIP'02: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING: COMPUTATIONAL INTELLIGENCE FOR THE E-AGE | 2002年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we present a word encoding and clustering technique that groups web documents based on the importance of the words that appear in the documents. We use a two level self-organizing map architecture to generate clusters of words and documents. We propose that by capturing word importance information of words, similar documents can be then clustered to assist in web document retrieval. A web document retrieval system is presented to demonstrate how this approach could be integrated into web search.
引用
收藏
页码:2207 / 2211
页数:5
相关论文
共 50 条
  • [1] Semantic based clustering of web documents
    Lin, TY
    Chiang, IJ
    2005 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, VOLS 1 AND 2, 2005, : 189 - 192
  • [2] Clustering template based web documents
    Gottron, Thomas
    ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 40 - 51
  • [3] Effect of Multi-word Features on the Hierarchical Clustering of Web Documents
    Karthick, S.
    Shalinie, S. Mercy
    Eswarimeena, A. R.
    Madhumitha, P.
    Abhinaya, T. Naga
    2014 INTERNATIONAL CONFERENCE ON RECENT TRENDS IN INFORMATION TECHNOLOGY (ICRTIT), 2014,
  • [4] Clustering XML Documents for Web Based Learning
    Periakaruppan, Ramanathan
    Nadarajan, Rethinaswamy
    ADVANCES IN WEB-BASED LEARNING, 2015, 8390 : 234 - 243
  • [5] Clustering web documents based on knowledge granularity
    Huang, FL
    Zhang, SC
    FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 85 - 96
  • [6] Link-Based Clustering Algorithm for Clustering Web Documents
    Ashokkumar, P.
    Don, S.
    JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) : 4096 - 4107
  • [7] Textual-based clustering of web documents
    Brzeminski, P
    Pedrycz, W
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2004, 12 (06) : 715 - 743
  • [8] A Novel Indexing Technique for Web Documents using Hierarchical Clustering
    Gupta, Deepti
    Bhatia, Komal Kumar
    Sharma, A. K.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2009, 9 (09): : 168 - 175
  • [9] A word-based soft clustering algorithm for documents
    Lin, KI
    Kondadadi, R
    COMPUTERS AND THEIR APPLICATIONS, 2001, : 391 - 394
  • [10] A Method for Web Documents Clustering Based on Dynamic Concept
    Wang, Yunhua
    Ke, Huiyan
    2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 2183 - 2187