A similarity-based soft clustering algorithm for documents

被引:0
|
作者
Lin, KI [1 ]
Kondadadi, R [1 ]
机构
[1] Memphis State Univ, Dept Math Sci, Memphis, TN 38152 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document clustering is an important tool for applications such as Web search engines. Clustering documents enables the user to have a good overall view of the information contained in the documents that he has. However, existing algorithms suffer from various aspects: hard clustering algorithms (where each document belongs to exactly one cluster) cannot detect the multiple themes of a document, while soft clustering algorithms (where each document can belong to multiple clusters) are usually inefficient. We propose SISC (SImilarity-based Soft Clustering), an efficient soft clustering algorithm based on a given similarity measure. SISC required only a similarity measure for clustering and uses randomization to help make the clustering efficient. Comparison with existing hard clustering algorithms like K-means and its variants shows that SISC is both effective and efficient.
引用
收藏
页码:40 / 47
页数:2
相关论文
共 50 条
  • [11] Ranking Documents using Similarity-based PageRanks
    Hatakenaka, Shota
    Miura, Takao
    2011 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING (PACRIM), 2011, : 19 - 24
  • [12] A new unsupervised feature selection algorithm using similarity-based feature clustering
    Zhu, Xiaoyan
    Wang, Yu
    Li, Yingbin
    Tan, Yonghui
    Wang, Guangtao
    Song, Qinbao
    COMPUTATIONAL INTELLIGENCE, 2019, 35 (01) : 2 - 22
  • [13] A novel similarity-based fuzzy clustering algorithm by integrating PCM and mountain method
    Tseng, Vincent S.
    Kao, Ching-Pin
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2007, 15 (06) : 1188 - 1196
  • [14] An Improved Web Information Summarization Method Using Sentence Similarity-Based Soft Clustering
    Tang, Jun
    Zhao, Xiaojuan
    2009 INTERNATIONAL CONFERENCE ON FUTURE BIOMEDICAL INFORMATION ENGINEERING (FBIE 2009), 2009, : 292 - 295
  • [15] An efficient similarity-based approach for comparing XML documents
    Oliveira, Alessandreia
    Tessarolli, Gabriel
    Ghiotto, Gleiph
    Pinto, Bruno
    Campello, Fernando
    Marques, Matheus
    Oliveira, Carlos
    Rodrigues, Igor
    Kalinowski, Marcos
    Souza, Ueverton
    Murta, Leonardo
    Braganholo, Vanessa
    INFORMATION SYSTEMS, 2018, 78 : 40 - 57
  • [16] The directional similarity-based clustering method DSCM
    School of Information Engineering, Southern Yangtze University, Wuxi 214036, China
    不详
    不详
    不详
    Jisuanji Yanjiu yu Fazhan, 2006, 8 (1425-1431):
  • [17] Similarity-Based Clustering For IoT Device Classification
    Dupont, Guillaume
    Leite, Cristoffer
    dos Santos, Daniel Ricardo
    Costante, Elisa
    den Hartog, Jerry
    Etalle, Sandro
    2021 IEEE INTERNATIONAL CONFERENCE ON OMNI-LAYER INTELLIGENT SYSTEMS (IEEE COINS 2021), 2021, : 104 - 110
  • [18] Similarity-based Fuzzy clustering for user profiling
    Castellano, Giovanna
    Fanelli, A. Maria
    Mencar, Corrado
    Torsello, M. Alessandra
    PROCEEDING OF THE 2007 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS, 2007, : 75 - 78
  • [19] Similarity-based clustering for patterns of extreme values
    de Carvalho, Miguel
    Huser, Raphael
    Rubio, Rodrigo
    STAT, 2023, 12 (01):
  • [20] A Cost Function for Similarity-Based Hierarchical Clustering
    Dasgupta, Sanjoy
    STOC'16: PROCEEDINGS OF THE 48TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2016, : 118 - 127