Unsupervised reduction of the dimensionality followed by supervised learning with a perceptron improves the classification of conditions in DNA microarray gene expression data.

被引:0
作者
Conde, L [1 ]
Mateos, A [1 ]
Herrero, J [1 ]
Dopazo, J [1 ]
机构
[1] Spanish Natl Canc Ctr, Bioinformat Unit, Biotechnol Programme, CNIO, Madrid 28029, Spain
来源
NEURAL NETWORKS FOR SIGNAL PROCESSING XII, PROCEEDINGS | 2002年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This manuscript describes a combined approach of unsupervised clustering followed by supervised learning that provides an efficient classification of conditions in DNA array gene expression experiments (different cell lines including some cancer types, in the cases shown). Firstly the dimensionality of the dataset of gene expression profiles is reduced to a number of non-redundant clusters of co-expressing genes using an unsupervised clustering algorithm, the Self Organizing Tree, Algorithm (SOTA), a hierarchical version of Self Organizing Maps (SOM). Then, the average values of these clusters are used for the training of a perceptron that produces a very efficient classification of the conditions. This way of reducing the dimensionality of the data set seems to perform better than other ones previously proposed such as PCA. In addition, the weights that connect the gene clusters to the different experimental conditions can be used to assess the relative importance of the genes in the definition of these classes. Finally, Gene Ontology (GO) terms are used to infer a possible biological role for these groups of genes and to asses the validity of the classification from a biological point of view.
引用
收藏
页码:77 / 86
页数:10
相关论文
共 12 条
[1]   Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling [J].
Alizadeh, AA ;
Eisen, MB ;
Davis, RE ;
Ma, C ;
Lossos, IS ;
Rosenwald, A ;
Boldrick, JG ;
Sabet, H ;
Tran, T ;
Yu, X ;
Powell, JI ;
Yang, LM ;
Marti, GE ;
Moore, T ;
Hudson, J ;
Lu, LS ;
Lewis, DB ;
Tibshirani, R ;
Sherlock, G ;
Chan, WC ;
Greiner, TC ;
Weisenburger, DD ;
Armitage, JO ;
Warnke, R ;
Levy, R ;
Wilson, W ;
Grever, MR ;
Byrd, JC ;
Botstein, D ;
Brown, PO ;
Staudt, LM .
NATURE, 2000, 403 (6769) :503-511
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]  
BROWN PO, 1999, NAT BIOTECHNOL, V14, P1675
[4]   Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree [J].
Dopazo, J ;
Carazo, JM .
JOURNAL OF MOLECULAR EVOLUTION, 1997, 44 (02) :226-233
[5]   Support vector machine classification and validation of cancer tissue samples using microarray expression data [J].
Furey, TS ;
Cristianini, N ;
Duffy, N ;
Bednarski, DW ;
Schummer, M ;
Haussler, D .
BIOINFORMATICS, 2000, 16 (10) :906-914
[6]   A hierarchical unsupervised growing neural network for clustering gene expression patterns [J].
Herrero, J ;
Valencia, A ;
Dopazo, J .
BIOINFORMATICS, 2001, 17 (02) :126-136
[7]   Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks [J].
Khan, J ;
Wei, JS ;
Ringnér, M ;
Saal, LH ;
Ladanyi, M ;
Westermann, F ;
Berthold, F ;
Schwab, M ;
Antonescu, CR ;
Peterson, C ;
Meltzer, PS .
NATURE MEDICINE, 2001, 7 (06) :673-679
[8]  
MATEOS A, 2002, IN PRESS MICROARRAY, V2
[9]  
Oliveros J C, 2000, Genome Inform Ser Workshop Genome Inform, V11, P106
[10]   PREDICTING THE SECONDARY STRUCTURE OF GLOBULAR-PROTEINS USING NEURAL NETWORK MODELS [J].
Qian, N ;
SEJNOWSKI, TJ .
JOURNAL OF MOLECULAR BIOLOGY, 1988, 202 (04) :865-884