Graphical exploration of gene expression data:: A comparative study of three multivariate methods

被引:56
作者
Wouters, L
Göhlmann, HW
Bijnens, L
Kass, SU
Molenberghs, G
Lewi, PJ
机构
[1] Univ Limburg, Ctr Stat, Limburgs Univ Centrum, B-3590 Diepenbeek, Belgium
[2] Janssen Pharmaceut, Johnson & Johnson Pharmaceut Res & Dev, Ctr Mol Design, B-2340 Beerse, Belgium
关键词
bioinformatics; biplot; correspondence factor analysis; data mining; data visualization; gene expression data; microarray data; multivariate exploratory data analysis; principal component analysis; spectral map analysis;
D O I
10.1111/j.0006-341X.2003.00130.x
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
This article describes three multivariate projection methods and compares them for their ability to identify clusters of biological samples and genes using real-life data on gene expression levels of leukemia patients. It is shown that principal component analysis (PCA) has the disadvantage that the resulting principal factors are not very informative, while correspondence factor analysis (CFA) has difficulties interpreting distances between objects. Spectral map analysis (SMA) is introduced as an alternative approach to the analysis of microarray data. Weighted SMA outperforms PCA, and is at least as powerful as CFA, in finding clusters in the samples, as well as identifying genes related to these clusters. SMA addresses the problem of data analysis in microarray experiments in a more appropriate manner than CFA, and allows more flexible weighting to the genes and samples. Proper weighting is important, since it enables less reliable data to be down-weighted and more reliable information to be emphasized.
引用
收藏
页码:1131 / 1139
页数:9
相关论文
共 23 条
[1]   2 GROUPS OF RHINOVIRUSES REVEALED BY A PANEL OF ANTIVIRAL COMPOUNDS PRESENT SEQUENCE DIVERGENCE AND DIFFERENTIAL PATHOGENICITY [J].
ANDRIES, K ;
DEWINDT, B ;
SNOEKS, J ;
WOUTERS, L ;
MOEREELS, H ;
LEWI, PJ ;
JANSSEN, PAJ .
JOURNAL OF VIROLOGY, 1990, 64 (03) :1117-1123
[2]  
Benzecri JP., 1973, ANAL DONNEES, V2
[3]   Using biplots to interpret gene expression patterns in plants [J].
Chapman, S ;
Schenk, P ;
Kazan, K ;
Manners, J .
BIOINFORMATICS, 2002, 18 (01) :202-204
[4]   Identifying marker genes in transcription profiling data using a mixture of feature relevance experts [J].
Chow, ML ;
Moler, EJ ;
Mian, IS .
PHYSIOLOGICAL GENOMICS, 2001, 5 (02) :99-111
[5]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[6]  
FAES W, 1987, INT MANAGEMENT DEV R, V3, P183
[7]   Correspondence analysis applied to microarray data [J].
Fellenberg, K ;
Hauser, NC ;
Brors, B ;
Neutzner, A ;
Hoheisel, JD ;
Vingron, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (19) :10781-10786
[8]  
GABRIEL KR, 1971, BIOMETRIKA, V58, P453, DOI 10.2307/2334381
[9]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[10]  
Greenacre M., 1984, Theory and application of correspodence analysis