Exemplar-based low-rank matrix decomposition for data clustering

被引:5
作者
Wang, Lijun [1 ]
Dong, Ming [2 ]
机构
[1] Wayne State Univ, Dept Comp Sci, Detroit, MI 48201 USA
[2] Wayne State Univ, Dept Comp Sci, Detroit, MI 48201 USA
关键词
Clustering; Low-rank approximation; Subspaces; Matrix decomposition; MONTE-CARLO ALGORITHMS; NYSTROM METHOD; APPROXIMATIONS; FACTORIZATION; COMPUTATION;
D O I
10.1007/s10618-014-0347-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Today, digital data is accumulated at a faster than ever speed in science, engineering, biomedicine, and real-world sensing. The ubiquitous phenomenon of massive data and sparse information imposes considerable challenges in data mining research. In this paper, we propose a theoretical framework, Exemplar-based low-rank sparse matrix decomposition (EMD), to cluster large-scale datasets. Capitalizing on recent advances in matrix approximation and decomposition, EMD can partition datasets with large dimensions and scalable sizes efficiently. Specifically, given a data matrix, EMD first computes a representative data subspace and a near-optimal low-rank approximation. Then, the cluster centroids and indicators are obtained through matrix decomposition, in which we require that the cluster centroids lie within the representative data subspace. By selecting the representative exemplars, we obtain a compact "sketch"of the data. This makes the clustering highly efficient and robust to noise. In addition, the clustering results are sparse and easy for interpretation. From a theoretical perspective, we prove the correctness and convergence of the EMD algorithm, and provide detailed analysis on its efficiency, including running time and spatial requirements. Through extensive experiments performed on both synthetic and real datasets, we demonstrate the performance of EMD for clustering large-scale data.
引用
收藏
页码:324 / 357
页数:34
相关论文
共 50 条
  • [31] Parameterized low-rank binary matrix approximation
    Fomin, Fedor, V
    Golovach, Petr A.
    Panolan, Fahad
    DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (02) : 478 - 532
  • [32] DOA Estimation of Coherent Sources via Low-Rank Matrix Decomposition
    Yang, Zeqi
    Ma, Shuai
    Liu, Yiheng
    Zhang, Hua
    Lyu, Xiaode
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2024, 13 (11) : 3049 - 3053
  • [33] Robust low-rank diffraction separation and imaging by CUR matrix decomposition
    Lin, Peng
    Peng, Suping
    Xiang, Yang
    Li, Chuangjian
    Cui, Xiaoqin
    GEOPHYSICS, 2023, 88 (06) : V415 - V429
  • [34] A Nonnegative Projection Based Algorithm for Low-Rank Nonnegative Matrix Approximation
    Wang, Peitao
    He, Zhaoshui
    Xie, Kan
    Gao, Junbin
    Antolovich, Michael
    NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 240 - 247
  • [35] Parameterized low-rank binary matrix approximation
    Fedor V. Fomin
    Petr A. Golovach
    Fahad Panolan
    Data Mining and Knowledge Discovery, 2020, 34 : 478 - 532
  • [36] Graph regularized low-rank representation for submodule clustering
    Wu, Tong
    PATTERN RECOGNITION, 2020, 100
  • [37] GRAPH-REGULARIZED FAST LOW-RANK MATRIX APPROXIMATION USING THE NYSTROM METHOD FOR CLUSTERING
    Lee, Jieun
    Choe, Yoonsik
    2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2018,
  • [38] Low-rank kernel learning for graph-based clustering
    Kang, Zhao
    Wen, Liangjian
    Chen, Wenyu
    Xu, Zenglin
    KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 510 - 517
  • [39] Multilayer Sparsity-Based Tensor Decomposition for Low-Rank Tensor Completion
    Xue, Jize
    Zhao, Yongqiang
    Huang, Shaoguang
    Liao, Wenzhi
    Chan, Jonathan Cheung-Wai
    Kong, Seong G.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6916 - 6930
  • [40] Schur complement-based domain decomposition preconditioners with low-rank corrections
    Li, Ruipeng
    Xi, Yuanzhe
    Saad, Yousef
    NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2016, 23 (04) : 706 - 729