Exemplar-based low-rank matrix decomposition for data clustering

被引:5
作者
Wang, Lijun [1 ]
Dong, Ming [2 ]
机构
[1] Wayne State Univ, Dept Comp Sci, Detroit, MI 48201 USA
[2] Wayne State Univ, Dept Comp Sci, Detroit, MI 48201 USA
关键词
Clustering; Low-rank approximation; Subspaces; Matrix decomposition; MONTE-CARLO ALGORITHMS; NYSTROM METHOD; APPROXIMATIONS; FACTORIZATION; COMPUTATION;
D O I
10.1007/s10618-014-0347-0
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Today, digital data is accumulated at a faster than ever speed in science, engineering, biomedicine, and real-world sensing. The ubiquitous phenomenon of massive data and sparse information imposes considerable challenges in data mining research. In this paper, we propose a theoretical framework, Exemplar-based low-rank sparse matrix decomposition (EMD), to cluster large-scale datasets. Capitalizing on recent advances in matrix approximation and decomposition, EMD can partition datasets with large dimensions and scalable sizes efficiently. Specifically, given a data matrix, EMD first computes a representative data subspace and a near-optimal low-rank approximation. Then, the cluster centroids and indicators are obtained through matrix decomposition, in which we require that the cluster centroids lie within the representative data subspace. By selecting the representative exemplars, we obtain a compact "sketch"of the data. This makes the clustering highly efficient and robust to noise. In addition, the clustering results are sparse and easy for interpretation. From a theoretical perspective, we prove the correctness and convergence of the EMD algorithm, and provide detailed analysis on its efficiency, including running time and spatial requirements. Through extensive experiments performed on both synthetic and real datasets, we demonstrate the performance of EMD for clustering large-scale data.
引用
收藏
页码:324 / 357
页数:34
相关论文
共 50 条
  • [1] Exemplar-based low-rank matrix decomposition for data clustering
    Lijun Wang
    Ming Dong
    Data Mining and Knowledge Discovery, 2015, 29 : 324 - 357
  • [2] Exemplar-based large-scale low-rank matrix decomposition for collaborative prediction
    Lei, Hengxin
    Liu, Jinglei
    Yu, Yong
    INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2023, 100 (03) : 615 - 640
  • [3] LOW-RANK MATRIX APPROXIMATION BASED ON INTERMINGLED RANDOMIZED DECOMPOSITION
    Kaloorazi, Maboud F.
    Chen, Jie
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7475 - 7479
  • [4] Subspace-Orbit Randomized-Based Decomposition for Low-Rank Matrix Approximations
    Kaloorazi, Maboud F.
    de lamare, Rodrigo C.
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 2618 - 2622
  • [5] Randomized Rank-Revealing QLP for Low-Rank Matrix Decomposition
    Kaloorazi, Maboud F.
    Liu, Kai
    Chen, Jie
    De Lamare, Rodrigo C.
    Rahardja, Susanto
    IEEE ACCESS, 2023, 11 : 63650 - 63666
  • [6] LOW-RANK AND SPARSE MATRIX DECOMPOSITION-BASED PAN SHARPENING
    Rong, Kaixuan
    Wang, Shuang
    Zhang, Xiaohua
    Hou, Biao
    2012 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2012, : 2276 - 2279
  • [7] GoDec plus : Fast and Robust Low-Rank Matrix Decomposition Based on Maximum Correntropy
    Guo, Kailing
    Liu, Liu
    Xu, Xiangmin
    Xu, Dong
    Tao, Dacheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (06) : 2323 - 2336
  • [8] Speech Enhancement Based on Dictionary Learning and Low-Rank Matrix Decomposition
    Ji, Yunyun
    Zhu, Wei-Ping
    Champagne, Benoit
    IEEE ACCESS, 2019, 7 : 4936 - 4947
  • [9] Online Tensor Low-Rank Representation for Streaming Data Clustering
    Wu, Tong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (02) : 602 - 617
  • [10] BAND SELECTION OF HYPERSPECTRAL DATA WITH LOW-RANK DOUBLY STOCHASTIC MATRIX DECOMPOSITION
    Li, Jiming
    2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 44 - 47