Exemplar-based low-rank matrix decomposition for data clustering

被引：5

作者：

Wang, Lijun ^{[1
]}

Dong, Ming ^{[2
]}

机构：

[1] Wayne State Univ, Dept Comp Sci, Detroit, MI 48201 USA

[2] Wayne State Univ, Dept Comp Sci, Detroit, MI 48201 USA

来源：

DATA MINING AND KNOWLEDGE DISCOVERY | 2015年 / 29卷 / 02期

关键词：

Clustering; Low-rank approximation; Subspaces; Matrix decomposition; MONTE-CARLO ALGORITHMS; NYSTROM METHOD; APPROXIMATIONS; FACTORIZATION; COMPUTATION;

D O I：

10.1007/s10618-014-0347-0

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Today, digital data is accumulated at a faster than ever speed in science, engineering, biomedicine, and real-world sensing. The ubiquitous phenomenon of massive data and sparse information imposes considerable challenges in data mining research. In this paper, we propose a theoretical framework, Exemplar-based low-rank sparse matrix decomposition (EMD), to cluster large-scale datasets. Capitalizing on recent advances in matrix approximation and decomposition, EMD can partition datasets with large dimensions and scalable sizes efficiently. Specifically, given a data matrix, EMD first computes a representative data subspace and a near-optimal low-rank approximation. Then, the cluster centroids and indicators are obtained through matrix decomposition, in which we require that the cluster centroids lie within the representative data subspace. By selecting the representative exemplars, we obtain a compact "sketch"of the data. This makes the clustering highly efficient and robust to noise. In addition, the clustering results are sparse and easy for interpretation. From a theoretical perspective, we prove the correctness and convergence of the EMD algorithm, and provide detailed analysis on its efficiency, including running time and spatial requirements. Through extensive experiments performed on both synthetic and real datasets, we demonstrate the performance of EMD for clustering large-scale data.

引用

页码：324 / 357

页数：34

共 50 条

[31] Parameterized low-rank binary matrix approximation
Fomin, Fedor, V
Golovach, Petr A.
Panolan, Fahad
DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 34 (02) : 478 - 532
[32] DOA Estimation of Coherent Sources via Low-Rank Matrix Decomposition
Yang, Zeqi
Ma, Shuai
Liu, Yiheng
Zhang, Hua
Lyu, Xiaode
IEEE WIRELESS COMMUNICATIONS LETTERS, 2024, 13 (11) : 3049 - 3053
[33] Robust low-rank diffraction separation and imaging by CUR matrix decomposition
Lin, Peng
Peng, Suping
Xiang, Yang
Li, Chuangjian
Cui, Xiaoqin
GEOPHYSICS, 2023, 88 (06) : V415 - V429
[34] A Nonnegative Projection Based Algorithm for Low-Rank Nonnegative Matrix Approximation
Wang, Peitao
He, Zhaoshui
Xie, Kan
Gao, Junbin
Antolovich, Michael
NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 240 - 247
[35] Parameterized low-rank binary matrix approximation
Fedor V. Fomin
Petr A. Golovach
Fahad Panolan
Data Mining and Knowledge Discovery, 2020, 34 : 478 - 532
[36] Graph regularized low-rank representation for submodule clustering
Wu, Tong
PATTERN RECOGNITION, 2020, 100
[37] GRAPH-REGULARIZED FAST LOW-RANK MATRIX APPROXIMATION USING THE NYSTROM METHOD FOR CLUSTERING
Lee, Jieun
Choe, Yoonsik
2018 IEEE 28TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2018,
[38] Low-rank kernel learning for graph-based clustering
Kang, Zhao
Wen, Liangjian
Chen, Wenyu
Xu, Zenglin
KNOWLEDGE-BASED SYSTEMS, 2019, 163 : 510 - 517
[39] Multilayer Sparsity-Based Tensor Decomposition for Low-Rank Tensor Completion
Xue, Jize
Zhao, Yongqiang
Huang, Shaoguang
Liao, Wenzhi
Chan, Jonathan Cheung-Wai
Kong, Seong G.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6916 - 6930
[40] Schur complement-based domain decomposition preconditioners with low-rank corrections
Li, Ruipeng
Xi, Yuanzhe
Saad, Yousef
NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS, 2016, 23 (04) : 706 - 729

← 1 2 3 4 5 →