Fast Flexible Bipartite Graph Model for Co-Clustering

被引：13

作者：

Chen, Wei ^{[1
]}

Wang, Hongjun ^{[1
]}

Long, Zhiguo ^{[1
]}

Li, Tianrui ^{[1
]}

机构：

[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China

来源：

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING | 2023年 / 35卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Co-clustering; bipartite graph partition; faster performance; flexibility; NONNEGATIVE MATRIX FACTORIZATION; INFORMATION;

D O I：

10.1109/TKDE.2022.3194275

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Co-clustering methods make use of the correlation between samples and attributes to explore the co-occurrence structure in data. These methods have played a significant role in gene expression analysis, image segmentation, and document clustering. In bipartite graph partition-based co-clustering methods, the relationship between samples and attributes is described by constructing a diagonal symmetric bipartite graph matrix, which is clustered by the philosophy of spectral clustering. However, this not only has high time complexity but also the same number of row and column clusters. In fact, the number of categories of rows and columns often changes in the real world. To address these problems, this paper proposes a novel fast flexible bipartite graph model for the co-clustering method (FBGPC) that directly uses the original matrix to construct the bipartite graph. Then, it uses the inflation operation to partition the bipartite graph in order to learn the co-occurrence structure of the original data matrix based on the inherent relationship between bipartite graph partitioning and co-clustering. Finally, hierarchical clustering is used to obtain the clustering results according to the set relationship of the co-occurrence structure. Extensive empirical results show the effectiveness of our proposed model and verify the faster performance, generality, and flexibility of our model.

引用

页码：6930 / 6940

页数：11

共 38 条

[1]

Banerjee A, 2007, J MACH LEARN RES, V8, P1919

[2]

Berkhin P, 2006, GROUPING MULTIDIMENSIONAL DATA: RECENT ADVANCES IN CLUSTERING, P25

[3]

Bichot C.-E., 2010, Journal of Mathematical Modelling and Algorithms, V9, P131, DOI DOI 10.1007/S10852-010-9126-0

[4] Co-Clustering via Information-Theoretic Markov Aggregation [J].

Bloechl, Clemens ;

Amjad, Rana Ali ;

Geiger, Bernhard C. .

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (04) :720-732

[5] Biclustering in data mining [J].

Busygin, Stanislav ;

Prokopyev, Oleg ;

Pardalos, Panos M. .

COMPUTERS & OPERATIONS RESEARCH, 2008, 35 (09) :2964-2987

[6] Non-Negative Matrix Factorization for Semisupervised Heterogeneous Data Coclustering [J].

Chen, Yanhua ;

Wang, Lijun ;

Dong, Ming .

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (10) :1459-1474

[7]

Cui X, 2005, J COMPUTER SCI, P27

[8] Tri-regularized nonnegative matrix tri-factorization for co-clustering [J].

Deng, Ping ;

Li, Tianrui ;

Wang, Hongjun ;

Horng, Shi-Jinn ;

Yu, Zeng ;

Wang, Xiaomin .

KNOWLEDGE-BASED SYSTEMS, 2021, 226

[9]

Dhillon I. S., 2001, KDD-2001. Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P269, DOI 10.1145/502512.502550

[10]

Dhillon IS, 2003, Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P89, DOI 10.1145/2487575

← 1 2 3 4 →