Scalable spectral clustering with cosine similarity

被引:0
|
作者
Chen, Guangliang [1 ]
机构
[1] San Jose State Univ, Dept Math & Stat, San Jose, CA 95192 USA
来源
2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2018年
关键词
DATA SETS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a unified scalable computing framework for three versions of spectral clustering - Normalized Cut (Shi and Malik, 2000), the Ng-Jordan-Weiss (NJW) algorithm (2001), and Diffusion Maps (Coifman and Lafon, 2006), in the setting of cosine similarity. We assume that the input data is either sparse (e.g., as a document-term frequency matrix) or of only a few hundred dimensions (e.g., for small images or data obtained through PCA). We show that in such cases, spectral clustering can be implemented solely based on efficient operations on the data matrix such as elementwise manipulation, matrix-vector multiplication and low-rank SVD, thus entirely avoiding the weight matrix. Our algorithm is simple to implement, fast to run, accurate and robust to outliers. We demonstrate its superior performance through extensive experiments which compare our scalable algorithm with the plain implementation on several benchmark data sets.
引用
收藏
页码:314 / 319
页数:6
相关论文
共 7 条
  • [1] SpaRC: scalable sequence clustering using Apache Spark
    Shi, Lizhen
    Meng, Xiandong
    Tseng, Elizabeth
    Mascagni, Michael
    Wang, Zhong
    BIOINFORMATICS, 2019, 35 (05) : 760 - 768
  • [2] SWIFT: SCALABLE WEIGHTED ITERATIVE SAMPLING FOR FLOW CYTOMETRY CLUSTERING
    Naim, Iftekhar
    Datta, Suprakash
    Sharma, Gaurav
    Cavenaugh, James S.
    Mosmann, Tim R.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 509 - 512
  • [3] AnatomiCuts: Hierarchical clustering of tractography streamlines based on anatomical similarity
    Siless, Viviana
    Chang, Ken
    Fischl, Bruce
    Yendiki, Anastasia
    NEUROIMAGE, 2018, 166 : 32 - 45
  • [4] COIN: Correlation Index-Based Similarity Measure for Clustering Categorical Data
    Sowmiya, N.
    Gupta, N. Srinivasa
    Natarajan, Elango
    Valarmathi, B.
    Elamvazuthi, I.
    Parasuraman, S.
    Kit, Chun Ang
    Freitas, Lidio Inacio
    Abraham Gnanamuthu, Ezra Morris
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022
  • [5] Similarity measure and domain adaptation in multiple mixture model clustering: An application to image processing
    Leong, Siow Hoo
    Ong, Seng Huat
    PLOS ONE, 2017, 12 (07):
  • [6] A Scalable Data Chunk Similarity Based Compression Approach for Efficient Big Sensing Data Processing on Cloud
    Yang, Chi
    Chen, Jinjun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (06) : 1144 - 1157
  • [7] Multiclass imbalanced learning with one-versus-one decomposition and spectral clustering
    Li, Qianmu
    Song, Yanjun
    Zhang, Jing
    Sheng, Victor S.
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 147 (147)