RPCA-Based Tumor Classification Using Gene Expression Data

被引:71
作者
Liu, Jin-Xing [1 ,2 ]
Xu, Yong [1 ,3 ]
Zheng, Chun-Hou [4 ]
Kong, Heng [5 ]
Lai, Zhi-Hui [1 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Biocomp Res Ctr, Shenzhen 518055, Guangdong, Peoples R China
[2] Qufu Normal Univ, Sch Informat Sci & Engn, Rizhao 276826, Shandong, Peoples R China
[3] Key Lab Network Oriented Intelligent Computat, Shenzhen 518055, Guangdong, Peoples R China
[4] Anhui Univ, Coll Elect Engn & Automat, Hefei 230039, Anhui, Peoples R China
[5] Nan Shan Dist Peoples Hosp, Dept Gen Surg, Shenzhen 518055, Guangdong, Peoples R China
基金
中国博士后科学基金;
关键词
Classification; data mining; feature selection; principal component analysis; sparse method; MOLECULAR CLASSIFICATION; SPARSE REPRESENTATION; FEATURE-SELECTION; PREDICTION; REGULARIZATION; ALGORITHM; DISCOVERY; CANCER;
D O I
10.1109/TCBB.2014.2383375
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Microarray techniques have been used to delineate cancer groups or to identify candidate genes for cancer prognosis. As such problems can be viewed as classification ones, various classification methods have been applied to analyze or interpret gene expression data. In this paper, we propose a novel method based on robust principal component analysis (RPCA) to classify tumor samples of gene expression data. Firstly, RPCA is utilized to highlight the characteristic genes associated with a special biological process. Then, RPCA and RPCA+LDA (robust principal component analysis and linear discriminant analysis) are used to identify the features. Finally, support vector machine (SVM) is applied to classify the tumor samples of gene expression data based on the identified features. Experiments on seven data sets demonstrate that our methods are effective and feasible for tumor classification.
引用
收藏
页码:964 / 970
页数:7
相关论文
共 38 条
[1]   Sparse non-negative generalized PCA with applications to metabolomics [J].
Allen, Genevera I. ;
Maletic-Savatic, Mirjana .
BIOINFORMATICS, 2011, 27 (21) :3029-3035
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]  
[Anonymous], PLOS ONE
[4]  
[Anonymous], PLOS ONE
[5]  
[Anonymous], 2009, ARXIV09123599
[6]  
[Anonymous], 2011, ACM T INTEL SYST TEC, DOI DOI 10.1145/1961189.1961199
[7]   Partial least squares: a versatile tool for the analysis of high-dimensional genomic data [J].
Boulesteix, Anne-Laure ;
Strimmer, Korbinian .
BRIEFINGS IN BIOINFORMATICS, 2007, 8 (01) :32-44
[8]   Metagenes and molecular pattern discovery using matrix factorization [J].
Brunet, JP ;
Tamayo, P ;
Golub, TR ;
Mesirov, JP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (12) :4164-4169
[9]   SRDA: An efficient algorithm for large-scale discriminant analysis [J].
Cai, Deng ;
He, Xiaofei ;
Han, Jiawei .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (01) :1-12
[10]   Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems [J].
Cao, Kim-Anh Le ;
Boitard, Simon ;
Besse, Philippe .
BMC BIOINFORMATICS, 2011, 12