Dual Hypergraph Regularized PCA for Biclustering of Tumor Gene Expression Data

被引:15
作者
Wang, Xuesong [1 ]
Liu, Jian [1 ]
Cheng, Yuhu [1 ]
Liu, Aiping [2 ]
Chen, Enhong [3 ]
机构
[1] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Jiangsu, Peoples R China
[2] Pacific Parkinsons Res Ctr, Vancouver, BC, Canada
[3] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230026, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Principal component analysis; Manifolds; Gene expression; Tumors; Laplace equations; Clustering methods; Clustering algorithms; Biclustering; gene expression data; gene manifold; sample manifold; hypergraph regularization; principal component analysis; NONNEGATIVE MATRIX FACTORIZATION; DIMENSIONALITY REDUCTION; CLASS DISCOVERY; CANCER; CLASSIFICATION; PREDICTION;
D O I
10.1109/TKDE.2018.2874881
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is a powerful approach to analyze gene expression data which is crucial to the investigation of effective treatment of cancer. Many graph regularize-based clustering methods have been proposed and shown to be superior to the traditional clustering methods. However, they only focus on the inner structure in samples and fail to take the feature manifold into account. In gene expression data, its practical to hypothesize that both the samples and the genes lie on nonlinear low dimensional manifolds, namely sample manifold and gene manifold, respectively. Therefore in this paper, incorporating the geometric structures in both samples and features, we propose a Dual Hypergraph Regularized PCA (DHPCA) method for biclustering of tumor data. First, for gene expression data, we construct two hypergraphs, i.e., sample hypergraph and gene hypergraph, to estimate the intrinsic geometric structures of samples and genes. Then, we introduce the hypergraph regularization on both gene side and sample side. Finally, our biclustering method is formulated as two hypergraph regularized PCA with closed-form solution. We experimentally validate our proposed DHPCA algorithm on real applications and the promising results indicate its potential in high dimension data analysis.
引用
收藏
页码:2292 / 2303
页数:12
相关论文
共 53 条
[31]  
Liu WW, 2015, ADV NEUR IN, V28
[32]  
Liu WW, 2017, ADV NEUR IN, V30
[33]  
Liu WW, 2017, J MACH LEARN RES, V18
[34]   Metric Learning for Multi-Output Tasks [J].
Liu, Weiwei ;
Xu, Donna ;
Tsang, Ivor W. ;
Zhang, Wenjie .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (02) :408-422
[35]  
Liu WW, 2017, J MACH LEARN RES, V18
[36]   Biclustering algorithms for biological data analysis: A survey [J].
Madeira, SC ;
Oliveira, AL .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2004, 1 (01) :24-45
[37]   A systematic comparative evaluation of biclustering techniques [J].
Padilha, Victor A. ;
Campello, Ricardo J. G. B. .
BMC BIOINFORMATICS, 2017, 18
[38]   Molecular portraits of human breast tumours [J].
Perou, CM ;
Sorlie, T ;
Eisen, MB ;
van de Rijn, M ;
Jeffrey, SS ;
Rees, CA ;
Pollack, JR ;
Ross, DT ;
Johnsen, H ;
Akslen, LA ;
Fluge, O ;
Pergamenschikov, A ;
Williams, C ;
Zhu, SX ;
Lonning, PE ;
Borresen-Dale, AL ;
Brown, PO ;
Botstein, D .
NATURE, 2000, 406 (6797) :747-752
[39]   Kpax3: Bayesian bi-clustering of large sequence datasets [J].
Pessia, Alberto ;
Corander, Jukka .
BIOINFORMATICS, 2018, 34 (12) :2132-2133
[40]   Biclustering on expression data: A review [J].
Pontes, Beatriz ;
Giraldez, Raul ;
Aguilar-Ruiz, Jesus S. .
JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 57 :163-180