Dual Hypergraph Regularized PCA for Biclustering of Tumor Gene Expression Data

被引:15
作者
Wang, Xuesong [1 ]
Liu, Jian [1 ]
Cheng, Yuhu [1 ]
Liu, Aiping [2 ]
Chen, Enhong [3 ]
机构
[1] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Jiangsu, Peoples R China
[2] Pacific Parkinsons Res Ctr, Vancouver, BC, Canada
[3] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230026, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Principal component analysis; Manifolds; Gene expression; Tumors; Laplace equations; Clustering methods; Clustering algorithms; Biclustering; gene expression data; gene manifold; sample manifold; hypergraph regularization; principal component analysis; NONNEGATIVE MATRIX FACTORIZATION; DIMENSIONALITY REDUCTION; CLASS DISCOVERY; CANCER; CLASSIFICATION; PREDICTION;
D O I
10.1109/TKDE.2018.2874881
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is a powerful approach to analyze gene expression data which is crucial to the investigation of effective treatment of cancer. Many graph regularize-based clustering methods have been proposed and shown to be superior to the traditional clustering methods. However, they only focus on the inner structure in samples and fail to take the feature manifold into account. In gene expression data, its practical to hypothesize that both the samples and the genes lie on nonlinear low dimensional manifolds, namely sample manifold and gene manifold, respectively. Therefore in this paper, incorporating the geometric structures in both samples and features, we propose a Dual Hypergraph Regularized PCA (DHPCA) method for biclustering of tumor data. First, for gene expression data, we construct two hypergraphs, i.e., sample hypergraph and gene hypergraph, to estimate the intrinsic geometric structures of samples and genes. Then, we introduce the hypergraph regularization on both gene side and sample side. Finally, our biclustering method is formulated as two hypergraph regularized PCA with closed-form solution. We experimentally validate our proposed DHPCA algorithm on real applications and the promising results indicate its potential in high dimension data analysis.
引用
收藏
页码:2292 / 2303
页数:12
相关论文
共 53 条
[1]   Extracting Cross-Ontology Weighted Association Rules from Gene Ontology Annotations [J].
Agapito, Giuseppe ;
Milano, Marianna ;
Guzzi, Pietro Hiram ;
Cannataro, Mario .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (02) :197-208
[2]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[3]  
[Anonymous], 2005, PRINCIPAL COMPONENT, DOI [DOI 10.1002/0470013192.BSA501, 10.1002/0470013192.bsa501]
[4]  
[Anonymous], 2006, 19 INT C NEURAL INFO
[5]   Laplacian eigenmaps for dimensionality reduction and data representation [J].
Belkin, M ;
Niyogi, P .
NEURAL COMPUTATION, 2003, 15 (06) :1373-1396
[6]   Metagenes and molecular pattern discovery using matrix factorization [J].
Brunet, JP ;
Tamayo, P ;
Golub, TR ;
Mesirov, JP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (12) :4164-4169
[7]   Graph Regularized Nonnegative Matrix Factorization for Data Representation [J].
Cai, Deng ;
He, Xiaofei ;
Han, Jiawei ;
Huang, Thomas S. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (08) :1548-1560
[8]   Non-negative Matrix Factorization on Manifold [J].
Cai, Deng ;
He, Xiaofei ;
Wu, Xiaoyun ;
Han, Jiawei .
ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, :63-+
[9]   ToppGene Suite for gene list enrichment analysis and candidate gene prioritization [J].
Chen, Jing ;
Bardes, Eric E. ;
Aronow, Bruce J. ;
Jegga, Anil G. .
NUCLEIC ACIDS RESEARCH, 2009, 37 :W305-W311
[10]   PurTreeClust: A Clustering Algorithm for Customer Segmentation from Massive Customer Transaction Data [J].
Chen, Xiaojun ;
Fang, Yixiang ;
Yang, Min ;
Nie, Feiping ;
Zhao, Zhou ;
Huang, Joshua Zhexue .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (03) :559-572