A new similarity combining reconstruction coefficient with pairwise distance for agglomerative clustering

被引:39
作者
Cai, Zhiling [1 ]
Yang, Xiaofei [1 ]
Huang, Tianyi [1 ]
Zhu, William [1 ]
机构
[1] Univ Elect Sci & Technol China, Lab GRC & AI, Chengdu, Sichuan, Peoples R China
基金
中国国家自然科学基金;
关键词
Unsupervised learning; Clustering; Similarity graph; Pairwise distance based similarity; Reconstruction coefficient based similarity; ALGORITHM; GRAPH;
D O I
10.1016/j.ins.2019.08.048
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Agglomerative clustering is a mainstream clustering method that can produce an informative hierarchical structure of clusters. Existing similarities in agglomerative clustering are typically based on the pairwise distance. Although this type of similarity captures the local structure of data well, it is sensitive to noise and outliers because it considers only the distance between data points. In this paper, we propose a new similarity called RCPD by combining the reconstruction coefficient, which is robust to noise and outliers, with the pairwise distance for agglomerative clustering. Our new similarity takes advantage of both the distance between data points and the linear representation among data points. Thus, RCPD not only captures the local structure of data well but is also robust to noise and outliers. The experimental results on 11 real-world benchmark datasets show that our new clustering method consistently outperforms many state-of-the-art clustering approaches. (C) 2019 Elsevier Inc. All rights reserved.
引用
收藏
页码:173 / 182
页数:10
相关论文
共 49 条
  • [1] [Anonymous], 2010, P 16 ACM SIGKDD INT, DOI [10.1145/1835804.1835848, DOI 10.1145/1835804.1835848]
  • [2] Asuncion Arthur, 2007, UCI machine learning repository
  • [3] Beeferman D., 2000, Proceedings. KDD-2000. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P407, DOI 10.1145/347090.347176
  • [4] Efficient agglomerative hierarchical clustering
    Bouguettaya, Athman
    Yu, Qi
    Liu, Xumin
    Zhou, Xiangmin
    Song, Andy
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (05) : 2785 - 2797
  • [5] Multi-label feature selection via feature manifold learning and sparsity regularization
    Cai, Zhiling
    Zhu, William
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (08) : 1321 - 1334
  • [6] Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process
    Chandran, Uma R.
    Ma, Changqing
    Dhir, Rajiv
    Bisceglia, Michelle
    Lyons-Weiler, Maureen
    Liang, Wenjing
    Michalopoulos, George
    Becich, Michael
    Monzon, Federico A.
    [J]. BMC CANCER, 2007, 7 (1)
  • [7] Chen K, 2018, PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS AND COMPUTER AIDED EDUCATION (ICISCAE 2018), P426, DOI 10.1109/ICISCAE.2018.8666829
  • [8] An automated palmprint recognition system
    Connie, T
    Jin, ATB
    Ong, MGK
    Ling, DNC
    [J]. IMAGE AND VISION COMPUTING, 2005, 23 (05) : 501 - 515
  • [9] A survey on soft subspace clustering
    Deng, Zhaohong
    Choi, Kup-Sze
    Jiang, Yizhang
    Wang, Jun
    Wang, Shitong
    [J]. INFORMATION SCIENCES, 2016, 348 : 84 - 106
  • [10] Sparse Subspace Clustering: Algorithm, Theory, and Applications
    Elhamifar, Ehsan
    Vidal, Rene
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (11) : 2765 - 2781