A novel dimensionality reduction technique based on independent component analysis for modeling microarray gene expression data

被引:0
作者
Liu, H [1 ]
Kustra, R [1 ]
Zhang, J [1 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON M5S 3G4, Canada
来源
IC-AI '04 & MLMTA'04 , VOL 1 AND 2, PROCEEDINGS | 2004年
关键词
gene expression data; dimensionality reduction; independent component analysis; latent regulatory factors;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
DNA microarray experiments generating thousands of gene expression measurements, are being used to gather information from tissue and cell samples regarding gene expression differences that will be useful in diagnosing disease. But one challenge of microarray studies is the fact that the number n of samples collected is relatively small compared to the number p of genes per sample which are usually in thousands. In statistical terms this very large number of predictors compared to a small number of samples or observations makes the classification problem difficult. This is known as the "curse of dimensionality problem". An efficient way to solve this problem is by using dimensionality reduction techniques. Principle Component Analysis(PCA) is a leading method for dimensionality reduction of gene expression data which is optimal in the sense of least square error In this paper we propose a new dimensionality reduction technique for specific bioinformatics applications based on Independent component Analysis(ICA). Being able to exploit higher order statistics to identify a linear model result, this ICA based dimensionality reduction technique outperforms PCA from both statistical and biological significance aspects. We present experiments on NCI 60 dataset to show this result.
引用
收藏
页码:1133 / 1139
页数:7
相关论文
共 14 条
[1]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[2]  
ALTHAUSER RP, 1971, CAUSAL MODELS SOCIAL, P453
[3]  
[Anonymous], 1999, WAVELET TOUR SIGNAL
[4]  
Cherkassky V., 1998, LEARNING DATA
[5]  
CHIPPETTA P, 2002, P JOBIM 02 ST ML, P131
[6]  
Friedman J., 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5
[7]  
Hyvarinen A., 1999, Neural Computing Surveys, V2
[8]   Independent component analysis:: algorithms and applications [J].
Hyvärinen, A ;
Oja, E .
NEURAL NETWORKS, 2000, 13 (4-5) :411-430
[9]   Application of independent component analysis to microarrays [J].
Lee, SI ;
Batzoglou, S .
GENOME BIOLOGY, 2003, 4 (11)
[10]   Linear modes of gene expression determined by independent component analysis [J].
Liebermeister, W .
BIOINFORMATICS, 2002, 18 (01) :51-60