Exemplar-based low-rank matrix decomposition for data clustering

被引:5
作者
Wang, Lijun [1 ]
Dong, Ming [2 ]
机构
[1] Wayne State Univ, Dept Comp Sci, Detroit, MI 48201 USA
[2] Wayne State Univ, Dept Comp Sci, Detroit, MI 48201 USA
关键词
Clustering; Low-rank approximation; Subspaces; Matrix decomposition; MONTE-CARLO ALGORITHMS; NYSTROM METHOD; APPROXIMATIONS; FACTORIZATION; COMPUTATION;
D O I
10.1007/s10618-014-0347-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Today, digital data is accumulated at a faster than ever speed in science, engineering, biomedicine, and real-world sensing. The ubiquitous phenomenon of massive data and sparse information imposes considerable challenges in data mining research. In this paper, we propose a theoretical framework, Exemplar-based low-rank sparse matrix decomposition (EMD), to cluster large-scale datasets. Capitalizing on recent advances in matrix approximation and decomposition, EMD can partition datasets with large dimensions and scalable sizes efficiently. Specifically, given a data matrix, EMD first computes a representative data subspace and a near-optimal low-rank approximation. Then, the cluster centroids and indicators are obtained through matrix decomposition, in which we require that the cluster centroids lie within the representative data subspace. By selecting the representative exemplars, we obtain a compact "sketch"of the data. This makes the clustering highly efficient and robust to noise. In addition, the clustering results are sparse and easy for interpretation. From a theoretical perspective, we prove the correctness and convergence of the EMD algorithm, and provide detailed analysis on its efficiency, including running time and spatial requirements. Through extensive experiments performed on both synthetic and real datasets, we demonstrate the performance of EMD for clustering large-scale data.
引用
收藏
页码:324 / 357
页数:34
相关论文
共 50 条
  • [21] Dictionary optimization and clustering for exemplar-based voice conversion
    Sun, Wei
    Yu, Yibiao
    FIFTH INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2020, 11526
  • [22] EFFICIENT BACKGROUND SUBTRACTION WITH LOW-RANK AND SPARSE MATRIX DECOMPOSITION
    Ebadi, Salehe Erfanian
    Ones, Valia Guerra
    Izquierdo, Ebroul
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 4863 - 4867
  • [23] A Novel False Data Injection Attack Formulation Based on CUR Low-Rank Decomposition Method
    Mukherjee, Debottam
    Ghosh, Sandip
    Misra, Rakesh Kumar
    IEEE TRANSACTIONS ON SMART GRID, 2022, 13 (06) : 4965 - 4968
  • [24] A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition
    Sun, Chengli
    Zhu, Qi
    Wan, Minghua
    SPEECH COMMUNICATION, 2014, 60 : 44 - 55
  • [25] Rank Consistency Induced Multiview Subspace Clustering via Low-Rank Matrix Factorization
    Guo, Jipeng
    Sun, Yanfeng
    Gao, Junbin
    Hu, Yongli
    Yin, Baocai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (07) : 3157 - 3170
  • [26] SymNMF: nonnegative low-rank approximation of a similarity matrix for graph clustering
    Da Kuang
    Sangwoon Yun
    Haesun Park
    Journal of Global Optimization, 2015, 62 : 545 - 574
  • [27] Randomized two-sided subspace iteration for low-rank matrix and tensor decomposition
    Kaloorazi, M. F.
    Ahmadi-Asl, S.
    Rahardja, S.
    DIGITAL SIGNAL PROCESSING, 2024, 149
  • [28] Double Low-Rank Matrix Decomposition for Hyperspectral Image Denoising and Destriping
    Zhang, Hongyan
    Cai, Jingyi
    He, Wei
    Shen, Huanfeng
    Zhang, Liangpei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [29] SPARSE AND LOW-RANK MATRIX DECOMPOSITION VIA ALTERNATING DIRECTION METHOD
    Yuan, Xiaoming
    Yang, Junfeng
    PACIFIC JOURNAL OF OPTIMIZATION, 2013, 9 (01): : 167 - 180
  • [30] RANDOMIZED QUATERNION SINGULAR VALUE DECOMPOSITION FOR LOW-RANK MATRIX APPROXIMATION
    LIU, Q. I. A. O. H. U. A.
    LING, S. I. T. A. O.
    JIA, Z. H. I. G. A. N. G.
    SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2022, 44 (02) : A870 - A900