Sparse Gene Expression Data Analysis Based on Truncated Power

被引:0
作者
Shen, Ningmin [1 ]
Li, Jing [1 ]
Jin, Cheng [1 ]
Zhou, Peiyun [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Jiangsu, Peoples R China
来源
2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) | 2014年
关键词
Gene expression data; sparse principal component analysis; feature extraction; Truncated Power; PRINCIPAL COMPONENT ANALYSIS; CLASSIFICATION;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Cluster analysis has become a popular method for gene expression data, which can be used for the diagnosis of diseases accurately and rapidly through the class label. However, more attributes and less samples of gene expression data will produce a mass of redundant or disturbed information, resulting in the decline of the accuracy of the direct clustering acting on high dimensional data. Principal Component Analysis (PCA) is a classical method for dimension reduction which can transform high dimension data into low space. The shortcoming of PCA is the lack of strong interpretation because the loadings have no characteristic of sparsity. In this paper, a sparse PCA method based on Truncated Power, which can minimizes the cardinality of loadings as well as maximizes the percentage explained variances of principal components (PCs), was applied into the feature extraction method for gene expression, then the sparse PCs was fed into K-means process for clustering. Finally, the experimental results on three typical gene datasets verify that the sparse gene data can improve the efficiency and accuracy on clustering analysis.
引用
收藏
页数:6
相关论文
共 22 条
[1]   Fast Principal Component Analysis of Large-Scale Genome-Wide Data [J].
Abraham, Gad ;
Inouye, Michael .
PLOS ONE, 2014, 9 (04)
[2]  
Alsabti K, 1997, SPECTRAL EFFICIENT K
[3]  
[Anonymous], 2008, NIPS
[4]  
Atallah R, 2013, INCORPORATING KNOWN
[5]  
Bi X., 2012, IEEE INT C BIOINF BI, V2012, P1, DOI [10.1109/BIBM.2012.6392615, DOI 10.1109/BIBM.2012.6392615]
[6]   A direct formulation for sparse PCA using semidefinite programming [J].
d'Aspremont, Alexandre ;
El Ghaoui, Laurent ;
Jordan, Michael I. ;
Lanckriet, Gert R. G. .
SIAM REVIEW, 2007, 49 (03) :434-448
[7]  
Ding C, 2004, P 21 INT C MACH LEAR, P29
[8]   Independent component analysis-based penalized discriminant method for tumor classification using gene expression data [J].
Huang, De-Shuang ;
Zheng, Chun-Hou .
BIOINFORMATICS, 2006, 22 (15) :1855-1862
[9]  
Hyvrinen A, 2004, INDEPENDENT COMPONEN
[10]   A modified principal component technique based on the LASSO [J].
Jolliffe, IT ;
Trendafilov, NT ;
Uddin, M .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2003, 12 (03) :531-547