JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES

被引:350
作者
Lock, Eric F. [1 ]
Hoadley, Katherine A. [2 ]
Marron, J. S. [1 ]
Nobel, Andrew B. [1 ]
机构
[1] Univ N Carolina, Dept Stat & Operat Res, Chapel Hill, NC 27599 USA
[2] Univ N Carolina, Lineberger Comprehens Canc Ctr, Chapel Hill, NC 27599 USA
基金
美国国家科学基金会;
关键词
Data integration; multi-block data; principal component analysis; data fusion; MODEL; GLIOBLASTOMA; MULTIBLOCK; PLS;
D O I
10.1214/12-AOAS597
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Research in several fields now requires the analysis of data sets in which multiple high-dimensional types of data are available for a common set of objects. In particular, The Cancer Genome Atlas (TCGA) includes data from several diverse genomic technologies on the same cancerous tumor samples. In this paper we introduce Joint and Individual Variation Explained (JIVE), a general decomposition of variation for the integrated analysis of such data sets. The decomposition consists of three terms: a low-rank approximation capturing joint variation across data types, low-rank approximations for structured variation individual to each data type, and residual noise. JIVE quantifies the amount of joint variation between data types, reduces the dimensionality of the data and provides new directions for the visual exploration of joint and individual structures. The proposed method represents an extension of Principal Component Analysis and has clear advantages over popular two-block methods such as Canonical Correlation Analysis and Partial Least Squares. A JIVE analysis of gene expression and miRNA data on Glioblastoma Multiforme tumor samples reveals gene-miRNA associations and provides better characterization of tumor types. Data and software are available at https://genome.unc.edu/jive/.
引用
收藏
页码:523 / 542
页数:20
相关论文
共 33 条
  • [1] Correlation network analysis for data integration and biomarker selection
    Adourian, Aram
    Jennings, Ezra
    Balasubramanian, Raji
    Hines, Wade M.
    Damian, Doris
    Plasterer, Thomas N.
    Clish, Clary B.
    Stroobant, Paul
    McBurney, Robert
    Verheij, Elwin R.
    Bobeldijk, Ivana
    Van der Greef, Jan
    Lindberg, Johan
    Kenne, Kerstin
    Andersson, Ulf
    Hellmold, Heike
    Nilsson, Kerstin
    Salter, Hugh
    Schuppe-Koistinen, Ina
    [J]. MOLECULAR BIOSYSTEMS, 2008, 4 (03) : 249 - 259
  • [2] [Anonymous], 1985, Encyclopedia of Statistical Sciences
  • [3] International Stock Return Comovements
    Bekaert, Geert
    Hodrick, Robert J.
    Zhang, Xiaoyan
    [J]. JOURNAL OF FINANCE, 2009, 64 (06) : 2591 - 2626
  • [4] A Network Model of a Cooperative Genetic Landscape in Brain Tumors
    Bredel, Markus
    Scholtens, Denise M.
    Harsh, Griffith R.
    Bredel, Claudia
    Chandler, James P.
    Renfrow, Jaclyn J.
    Yadav, Ajay K.
    Vogel, Hannes
    Scheck, Adrienne C.
    Tibshirani, Robert
    Sikic, Branimir I.
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2009, 302 (03): : 261 - 275
  • [5] SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements
    Cabanski, Christopher R.
    Qi, Yuan
    Yin, Xiaoying
    Bair, Eric
    Hayward, Michele C.
    Fan, Cheng
    Li, Jianying
    Wilkerson, Matthew D.
    Marron, J. S.
    Perou, Charles M.
    Hayes, D. Neil
    [J]. PLOS ONE, 2010, 5 (03):
  • [6] Candes E. J., 2009, ROBUST PRINCIPAL COM
  • [7] Comprehensive genomic characterization defines human glioblastoma genes and core pathways
    Chin, L.
    Meyerson, M.
    Aldape, K.
    Bigner, D.
    Mikkelsen, T.
    VandenBerg, S.
    Kahn, A.
    Penny, R.
    Ferguson, M. L.
    Gerhard, D. S.
    Getz, G.
    Brennan, C.
    Taylor, B. S.
    Winckler, W.
    Park, P.
    Ladanyi, M.
    Hoadley, K. A.
    Verhaak, R. G. W.
    Hayes, D. N.
    Spellman, Paul T.
    Absher, D.
    Weir, B. A.
    Ding, L.
    Wheeler, D.
    Lawrence, M. S.
    Cibulskis, K.
    Mardis, E.
    Zhang, Jinghui
    Wilson, R. K.
    Donehower, L.
    Wheeler, D. A.
    Purdom, E.
    Wallis, J.
    Laird, P. W.
    Herman, J. G.
    Schuebel, K. E.
    Weisenberger, D. J.
    Baylin, S. B.
    Schultz, N.
    Yao, Jun
    Wiedemeyer, R.
    Weinstein, J.
    Sander, C.
    Gibbs, R. A.
    Gray, J.
    Kucherlapati, R.
    Lander, E. S.
    Myers, R. M.
    Perou, C. M.
    McLendon, Roger
    [J]. NATURE, 2008, 455 (7216) : 1061 - 1068
  • [8] MULTILEVEL FUNCTIONAL PRINCIPAL COMPONENT ANALYSIS
    Di, Chong-Zhi
    Crainiceanu, Ciprian M.
    Caffo, Brian S.
    Punjabi, Naresh M.
    [J]. ANNALS OF APPLIED STATISTICS, 2009, 3 (01) : 458 - 488
  • [9] miRWalk - Database: Prediction of possible miRNA binding sites by "walking" the genes of three genomes
    Dweep, Harsh
    Sticht, Carsten
    Pandey, Priyanka
    Gretz, Norbert
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2011, 44 (05) : 839 - 847
  • [10] miR-124a is frequently down-regulated in glioblastoma and is involved in migration and invasion
    Fowler, Adam
    Thomson, Daniel
    Giles, Keith
    Maleki, Sanaz
    Mreich, Ellein
    Wheeler, Helen
    Leedman, Peter
    Biggs, Michael
    Cook, Raymond
    Little, Nicholas
    Robinson, Bruce
    McDonald, Kerrie
    [J]. EUROPEAN JOURNAL OF CANCER, 2011, 47 (06) : 953 - 963