Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables

被引:222
作者
Zapala, Matthew A.
Schork, Nicholas J. [1 ]
机构
[1] Univ Calif San Diego, Moores UCSD Canc Ctr, Ctr Human Genet & Genom, Dept Psychiat,Biomed Sci Grad Program, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Moores UCSD Canc Ctr, Ctr Human Genet & Genom, Dept Psychiat,Polymorphism Res Lab, La Jolla, CA 92093 USA
[3] Univ Calif San Diego, Moores UCSD Canc Ctr, Ctr Human Genet & Genom, Dept Family & Prevent Med,Div Biostat, La Jolla, CA 92093 USA
[4] Univ Calif San Diego, Calif Inst Telecommun & Informat Technol, La Jolla, CA 92093 USA
关键词
analysis of variance; high-dimensional data; SINGULAR-VALUE DECOMPOSITION; CYCLIN-DEPENDENT KINASE-5; PHYLOGENETIC TREES; UP-REGULATION; COMPLEX;
D O I
10.1073/pnas.0609333103
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A fundamental step in the analysis of gene expression and other high-dimensional genomic data is the calculation of the similarity or distance between pairs of individual samples in a study. If one has collected N total samples and assayed the expression level of G genes on those samples, then an N x N similarity matrix can be formed that reflects the correlation or similarity of the samples with respect to the expression values over the G genes. This matrix can then be examined for patterns via standard data reduction and cluster analysis techniques. We consider an alternative to conventional data reduction and cluster analyses of similarity matrices that is rooted in traditional linear models. This analysis method allows predictor variables collected on the samples to be related to variation in the pairwise similarity/distance values reflected in the matrix. The proposed multivariate method avoids the need for reducing the dimensions of a similarity matrix, can be used to assess relationships between the genes used to construct the matrix and additional information collected on the samples under study, and can be used to analyze individual genes or groups of genes identified in different ways. The technique can be used with any high-dimensional assay or data type and is ideally suited for testing subsets of genes defined by their participation in a biochemical pathway or other a priori grouping. We showcase the methodology using three published gene expression data sets.
引用
收藏
页码:19430 / 19435
页数:6
相关论文
共 49 条
[11]   How does gene expression clustering work? [J].
D'haeseleer, P .
NATURE BIOTECHNOLOGY, 2005, 23 (12) :1499-1501
[12]   Cannabinoids protect astrocytes from ceramide-induced apoptosis through the phosphatidylinositol 3-kinase/protein kinase B pathway [J].
del Pulgar, TG ;
de Ceballos, ML ;
Guzmán, M ;
Velasco, G .
JOURNAL OF BIOLOGICAL CHEMISTRY, 2002, 277 (39) :36527-36533
[13]  
Edgington E.S., 1995, Randomization Tests, V3rd Edn
[14]   Cluster analysis and display of genome-wide expression patterns [J].
Eisen, MB ;
Spellman, PT ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1998, 95 (25) :14863-14868
[15]   Testing association of a pathway with survival using gene expression data [J].
Goeman, JJ ;
Oosting, J ;
Cleton-Jansen, AM ;
Anninga, JK ;
van Houwelingen, HC .
BIOINFORMATICS, 2005, 21 (09) :1950-1957
[16]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[17]   METRIC AND EUCLIDEAN PROPERTIES OF DISSIMILARITY COEFFICIENTS [J].
GOWER, JC ;
LEGENDRE, P .
JOURNAL OF CLASSIFICATION, 1986, 3 (01) :5-48
[18]   Analysis of distance for structured multivariate data and extensions to multivariate analysis of variance [J].
Gower, JC ;
Krzanowski, WJ .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 1999, 48 :505-519
[19]   Quantitative analysis of complex protein mixtures using isotope-coded affinity tags [J].
Gygi, SP ;
Rist, B ;
Gerber, SA ;
Turecek, F ;
Gelb, MH ;
Aebersold, R .
NATURE BIOTECHNOLOGY, 1999, 17 (10) :994-999
[20]   Visualising very large phylogenetic trees in three dimensional hyperbolic space [J].
Hughes, T ;
Hyun, Y ;
Liberles, DA .
BMC BIOINFORMATICS, 2004, 5 (1)