Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient

被引:28
作者
Mayer, Claus-Dieter [1 ]
Lorent, Julie [2 ]
Horgan, Graham W. [1 ]
机构
[1] Biomath & Stat Scotland, Edinburgh, Midlothian, Scotland
[2] Inst Natl Sci Appl Toulouse, Toulouse, France
关键词
RV coefficient; data integration; multivariate analysis; omics data; CO-INERTIA ANALYSIS; MULTIVARIATE-ANALYSIS; PACKAGE; TOOL;
D O I
10.2202/1544-6115.1540
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The integration of multiple high-dimensional data sets (omics data) has been a very active but challenging area of bioinformatics research in recent years. Various adaptations of non-standard multivariate statistical tools have been suggested that allow to analyze and visualize such data sets simultaneously. However, these methods typically can deal with two data sets only, whereas systems biology experiments often generate larger numbers of high-dimensional data sets. For this reason, we suggest an explorative analysis of similarity between data sets as an initial analysis steps. This analysis is based on the RV coefficient, a matrix correlation, that can be interpreted as a generalization of the squared correlation from two single variables to two sets of variables. It has been shown before however that the high-dimensionality of the data introduces substantial bias to the RV. We therefore introduce an alternative version, the adjusted RV, which is unbiased in the case of independent data sets. We can also show that in many situations, particularly for very high-dimensional data sets, the adjusted RV is a better estimator than previously RV versions in terms of the mean square error and the power of the independence test based on it. We demonstrate the usefulness of the adjusted RV by applying it to data set of 19 different multivariate data sets from a systems biology experiment. The pairwise RV values between the data sets define a similarity matrix that we can use as an input to a hierarchical clustering or a multidimensional scaling. We show that this reveals biological meaningful subgroups of data sets in our study.
引用
收藏
页数:29
相关论文
共 25 条
[1]  
[Anonymous], 1996, Rev. Stat. Appl.
[2]   The NuGO proof of principle study package: a collaborative research effort of the European Nutrigenomics Organisation [J].
Baccini, Michela ;
Bachmaier, Eva-Maria ;
Biggeri, Annibale ;
Boekschoten, Mark V. ;
Bouwman, Freek G. ;
Brennan, Lorraine ;
Caesar, Robert ;
Cinti, Saverio ;
Coort, Susan L. ;
Crosley, Katie ;
Daniel, Hannelore ;
Drevon, Christian A. ;
Duthie, Susan ;
Eijssen, Lars ;
Elliott, Ruan M. ;
van Erk, Marjan ;
Evelo, Chris ;
Gibney, Mike ;
Heim, Carolin ;
Horgan, Graham W. ;
Johnson, Ian T. ;
Kelder, Thomas ;
Kleemann, Robert ;
Kooistra, Teake ;
van Iersel, Martijn P. ;
Mariman, Edwin C. ;
Mayer, Claus ;
McLoughlin, Gerard ;
Mueller, Michael ;
Mulholland, Francis ;
van Ommen, Ben ;
Polley, Abigael C. ;
Pujos-Guillot, Estelle ;
Rubio-Aliaga, Isabel ;
Roche, Helen M. ;
de Roos, Baukje ;
Sailer, Manuela ;
Tonini, Giulia ;
Williams, Lynda M. ;
de Wit, Nicole .
GENES AND NUTRITION, 2008, 3 (3-4) :147-151
[3]   Multiple co-inertia analysis:: a tool for assessing synchrony in the temporal variability of aquatic communities [J].
Bady, P ;
Dolédec, S ;
Dumont, B ;
Fruget, JF .
COMPTES RENDUS BIOLOGIES, 2004, 327 (01) :29-36
[4]  
CHEETHAM A H, 1969, Journal of Paleontology, V43, P1130
[5]   MADE4:: an R package for multivariate analysis of gene expression data [J].
Culhane, AC ;
Thioulouse, J ;
Perrière, G ;
Higgins, DG .
BIOINFORMATICS, 2005, 21 (11) :2789-2790
[6]   Cross-platform comparison and visualisation of gene expression data using co-inertia analysis -: art. no. 59 [J].
Culhane, AC ;
Perrière, G ;
Higgins, DG .
BMC BIOINFORMATICS, 2003, 4 (1)
[7]   Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach [J].
de Tayrac, Marie ;
Le, Sebastien ;
Aubry, Marc ;
Mosser, Jean ;
Husson, Francois .
BMC GENOMICS, 2009, 10 :32
[8]   CO-INERTIA ANALYSIS - AN ALTERNATIVE METHOD FOR STUDYING SPECIES ENVIRONMENT RELATIONSHIPS [J].
DOLEDEC, S ;
CHESSEL, D .
FRESHWATER BIOLOGY, 1994, 31 (03) :277-294
[9]   TREATMENT OF VECTOR VARIABLES [J].
ESCOUFIER, Y .
BIOMETRICS, 1973, 29 (04) :751-760
[10]  
Ezekiel M., 1930, METHODS CORRELATIONA