A similarity-based soft clustering algorithm for documents

被引:0
|
作者
Lin, KI [1 ]
Kondadadi, R [1 ]
机构
[1] Memphis State Univ, Dept Math Sci, Memphis, TN 38152 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document clustering is an important tool for applications such as Web search engines. Clustering documents enables the user to have a good overall view of the information contained in the documents that he has. However, existing algorithms suffer from various aspects: hard clustering algorithms (where each document belongs to exactly one cluster) cannot detect the multiple themes of a document, while soft clustering algorithms (where each document can belong to multiple clusters) are usually inefficient. We propose SISC (SImilarity-based Soft Clustering), an efficient soft clustering algorithm based on a given similarity measure. SISC required only a similarity measure for clustering and uses randomization to help make the clustering efficient. Comparison with existing hard clustering algorithms like K-means and its variants shows that SISC is both effective and efficient.
引用
收藏
页码:40 / 47
页数:2
相关论文
共 50 条
  • [1] Similarity-based soft clustering algorithm for web documents
    School of Remote Sensing Information Engineering, Wuhan University, Wuhan 430079, China
    Jisuanji Gongcheng, 2006, 2 (59-61):
  • [2] A Similarity-Based Clustering Algorithm for Fuzzy Data
    Hung, Wen-Liang
    Yang, Miin-Shen
    2010 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE 2010), 2010,
  • [3] Subspace Similarity-based Algorithm for Combine Multiple Clustering
    Xu, Sen
    Li, Xianfeng
    Chen, Rong
    Wu, Shuang
    Ni, Jun
    2013 SEVENTH INTERNATIONAL CONFERENCE ON INTERNET COMPUTING FOR ENGINEERING AND SCIENCE (ICICSE 2013), 2013, : 69 - 76
  • [4] An efficient similarity-based validity index for kernel clustering algorithm
    Pu, Yun-Wei
    Zhu, Ming
    Jin, Wei-Dong
    Hu, Lai-Zhao
    ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 1044 - 1049
  • [5] A clustering algorithm for short documents based on concept similarity
    Peng, Jing
    Yang, Dong-qing
    Wang, Jian-wei
    Wu, Meng-qing
    Wang, Jun-gang
    2007 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS 1 AND 2, 2007, : 42 - 45
  • [6] A word-based soft clustering algorithm for documents
    Lin, KI
    Kondadadi, R
    COMPUTERS AND THEIR APPLICATIONS, 2001, : 391 - 394
  • [7] An Improved Similarity-Based Clustering Algorithm for Multi-Database Mining
    Miloudi, Salim
    Wang, Yulin
    Ding, Wenjia
    ENTROPY, 2021, 23 (05)
  • [8] Similarity-based chemical clustering techniques
    Gute, BD
    Basak, SC
    Mills, D
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2005, 229 : U789 - U789
  • [9] Semantic Similarity-Based Clustering of Web Documents Using Fuzzy C-Means
    Avanija, J.
    Ramar, K.
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2015, 14 (03)
  • [10] A similarity-based robust clustering method
    Yang, MS
    Wu, KL
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2004, 26 (04) : 434 - 448