Two-dimensional data partitioning for non-negative matrix tri-factorization

被引:0
作者
Yan, Jiaxing [1 ]
Liu, Hai [1 ]
Lei, Zhiqi [2 ]
Rao, Yanghui [1 ]
Liu, Guan [3 ]
Xie, Haoran [4 ]
Tao, Xiaohui [5 ]
Wang, Fu Lee [6 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou, Peoples R China
[2] Guangdong Power Grid Corp, Informat Ctr, Guangzhou, Peoples R China
[3] Jinan Univ, Computat Commun Studies, Guangzhou, Peoples R China
[4] Lingnan Univ, Sch Data Sci, Hong Kong, Peoples R China
[5] Univ Southern Queensland, Sch Sci, Toowoomba, Australia
[6] Hong Kong Metropolitan Univ, Sch Sci & Technol, Kowloon, Hong Kong, Peoples R China
基金
澳大利亚研究理事会;
关键词
Non-negative matrix tri-factorization; Text co-clustering; 2-Dimensional data partitioning; FRAMEWORK;
D O I
10.1016/j.bdr.2024.100473
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a two-sided clustering and dimensionality reduction paradigm, Non -negative Matrix Tri-Factorization (NMTF) has attracted much attention in machine learning and data mining researchers due to its excellent performance and reliable theoretical support. Unlike Non -negative Matrix Factorization (NMF) methods applicable to onesided clustering only, NMTF introduces an additional factor matrix and uses the inherent duality of data to realize the mutual promotion of sample clustering and feature clustering, thus showing great advantages in many scenarios (e.g., text co -clustering). However, the existing methods for solving NMTF usually involve intensive matrix multiplication, which is characterized by high time and space complexities, that is, there are limitations of slow convergence of the multiplicative update rules and high memory overhead. In order to solve the above problems, this paper develops a distributed parallel algorithm with a 2 -dimensional data partition scheme for NMTF (i.e., PNMTF-2D). Experiments on multiple text datasets show that the proposed PNMTF-2D can substantially improve the computational efficiency of NMTF (e.g., the average iteration time is reduced by up to 99.7% on Amazon) while ensuring the effectiveness of convergence and co -clustering.
引用
收藏
页数:9
相关论文
共 24 条
[1]   A Semi-NMF-PCA Unified Framework for Data Clustering [J].
Allab, Kais ;
Labiod, Lazhar ;
Nadif, Mohamed .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2017, 29 (01) :2-16
[2]  
Bouma G., 2009, P BIENN GSCL C 2009, P31
[3]   MONITORS, MESSAGES, AND CLUSTERS - THE P4 PARALLEL PROGRAMMING SYSTEM [J].
BUTLER, RM ;
LUSK, EL .
PARALLEL COMPUTING, 1994, 20 (04) :547-564
[4]   Collective communication: theory, practice, and experience [J].
Chan, Ernie ;
Heimlich, Marcel ;
Purkayastha, Avi ;
van de Geijn, Robert .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2007, 19 (13) :1749-1783
[5]   Parallel Non-Negative Matrix Tri-Factorization for Text Data Co-Clustering [J].
Chen, Yufu ;
Lei, Zhiqi ;
Rao, Yanghui ;
Xie, Haoran ;
Wang, Fu Lee ;
Yin, Jian ;
Li, Qing .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (05) :5132-5146
[6]  
Clarke L., 1994, PROGRAMMING ENV MASS, P213, DOI [10.1007/978-3-0348-8534-8_21, DOI 10.1007/978-3-0348-8534-8_21, DOI 10.1007/978-3-0348-8534-821]
[7]  
Dhillon I. S., 2001, KDD-2001. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P269, DOI 10.1145/502512.502550
[8]  
Ding C., 2006, PROC 12 ACM SIGKDD I, P126
[9]   Parallel Nonnegative Matrix Factorization Algorithm on the Distributed Memory Platform [J].
Dong, Chao ;
Zhao, Huijie ;
Wang, Wei .
INTERNATIONAL JOURNAL OF PARALLEL PROGRAMMING, 2010, 38 (02) :117-137
[10]   Parallel Nonnegative Matrix Factorization via Newton Iteration [J].
Flatz, Markus ;
Vajtersic, Marian .
PARALLEL PROCESSING LETTERS, 2016, 26 (03)