Bayesian variable selection with graphical structure learning: Applications in integrative genomics

被引:7
作者
Kundu, Suprateek [1 ]
Cheng, Yichen [2 ]
Shin, Minsuk [3 ]
Manyam, Ganiraju [4 ]
Mallick, Bani K. [3 ]
Baladandayuthapani, Veerabhadran [4 ]
机构
[1] Emory Univ, Dept Biostat & Bioinformat, 1518 Clifton Rd, Atlanta, GA 30322 USA
[2] Georgia State Univ, Robinson Coll Business, 35 Brd St NW, Atlanta, GA 30303 USA
[3] Texas A&M, Dept Stat, 155 Ireland St, College Stn, TX 77843 USA
[4] MD Anderson Canc Res Ctr, Dept Biostat, Houston, TX 77030 USA
来源
PLOS ONE | 2018年 / 13卷 / 07期
基金
美国国家卫生研究院;
关键词
GLIOBLASTOMA-MULTIFORME; COPY NUMBER; HUMAN-COLON; GENE; CANCER; REGRESSION; MODELS; LASSO; INFORMATION; ACTIVATION;
D O I
10.1371/journal.pone.0195070
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Significant advances in biotechnology have allowed for simultaneous measurement of molecular data across multiple genomic, epigenomic and transcriptomic levels from a single tumor/patient sample. This has motivated systematic data-driven approaches to integrate multi-dimensional structured datasets, since cancer development and progression is driven by numerous co-ordinated molecular alterations and the interactions between them. We propose a novel multi-scale Bayesian approach that combines integrative graphical structure learning from multiple sources of data with a variable selection framework D to determine the key genomic drivers of cancer progression. The integrative structure learning is first accomplished through novel joint graphical models for heterogeneous (mixed scale) data, allowing for flexible and interpretable incorporation of prior existing knowledge. This subsequently informs a variable selection step to identify groups of co-ordinated molecular features within and across platforms associated with clinical outcomes of cancer progression, while according appropriate adjustments for multicollinearity and multiplicities. We evaluate our methods through rigorous simulations to establish superiority over existing methods that do not take the network and/or prior information into account. Our methods are motivated by and applied to a glioblastoma multiforme (GBM) dataset from The Cancer Genome Atlas to predict patient survival times integrating gene expression, copy number and methylation data. We find a high concordance between our selected prognostic gene network modules with known associations with GBM. In addition, our model discovers several novel cross-platform network interactions (both cis and trans acting) between gene expression, copy number variation associated gene dosing and epigenetic regulation through promoter methylation, some with known implications in the etiology of GBM. Our framework provides a useful tool for biomedical researchers, since clinical prediction using multi-platform genomic information is an important step towards personalized treatment of many cancers.
引用
收藏
页数:29
相关论文
共 49 条
  • [1] BAYESIAN-ANALYSIS OF BINARY AND POLYCHOTOMOUS RESPONSE DATA
    ALBERT, JH
    CHIB, S
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (422) : 669 - 679
  • [2] A boosting approach to structure learning of graphs with and without prior knowledge
    Anjum, Shahzia
    Doucet, Arnaud
    Holmes, Chris C.
    [J]. BIOINFORMATICS, 2009, 25 (22) : 2929 - 2936
  • [3] Bayesian Random Segmentation Models to Identify Shared Copy Number Aberrations for Array CGH Data
    Baladandayuthapani, Veerabhadran
    Ji, Yuan
    Talluri, Rajesh
    Nieto-Barajas, Luis E.
    Morris, Jeffrey S.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2010, 105 (492) : 1358 - 1375
  • [4] Towards systematic functional characterization of cancer genomes
    Boehm, Jesse S.
    Hahn, William C.
    [J]. NATURE REVIEWS GENETICS, 2011, 12 (07) : 487 - 498
  • [5] Consistent High-Dimensional Bayesian Variable Selection via Penalized Credible Regions
    Bondell, Howard D.
    Reich, Brian J.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2012, 107 (500) : 1610 - 1624
  • [6] The Somatic Genomic Landscape of Glioblastoma
    Brennan, Cameron W.
    Verhaak, Roel G. W.
    McKenna, Aaron
    Campos, Benito
    Noushmehr, Houtan
    Salama, Sofie R.
    Zheng, Siyuan
    Chakravarty, Debyani
    Sanborn, J. Zachary
    Berman, Samuel H.
    Beroukhim, Rameen
    Bernard, Brady
    Wu, Chang-Jiun
    Genovese, Giannicola
    Shmulevich, Ilya
    Barnholtz-Sloan, Jill
    Zou, Lihua
    Vegesna, Rahulsimham
    Shukla, Sachet A.
    Ciriello, Giovanni
    Yung, W. K.
    Zhang, Wei
    Sougnez, Carrie
    Mikkelsen, Tom
    Aldape, Kenneth
    Bigner, Darell D.
    Van Meir, Erwin G.
    Prados, Michael
    Sloan, Andrew
    Black, Keith L.
    Eschbacher, Jennifer
    Finocchiaro, Gaetano
    Friedman, William
    Andrews, David W.
    Guha, Abhijit
    Iacocca, Mary
    O'Neill, Brian P.
    Foltz, Greg
    Myers, Jerome
    Weisenberger, Daniel J.
    Penny, Robert
    Kucherlapati, Raju
    Perou, Charles M.
    Hayes, D. Neil
    Gibbs, Richard
    Marra, Marco
    Mills, Gordon B.
    Lander, Eric
    Spellman, Paul
    Wilson, Richard
    [J]. CELL, 2013, 155 (02) : 462 - 477
  • [7] Büschges R, 1999, BRAIN PATHOL, V9, P435
  • [8] Mutation of the PIK3CA gene in ovarian and breast cancer
    Campbell, IG
    Russell, SE
    Choong, DYH
    Montgomery, KG
    Ciavarella, ML
    Hooi, CSF
    Cristiano, BE
    Pearson, RB
    Phillips, WA
    [J]. CANCER RESEARCH, 2004, 64 (21) : 7678 - 7681
  • [9] Bayesian Kernel Mixtures for Counts
    Canale, Antonio
    Dunson, David B.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (496) : 1528 - 1539
  • [10] ACTIVATION OF KI-RAS 2 GENE IN HUMAN-COLON AND LUNG CARCINOMAS BY 2 DIFFERENT POINT MUTATIONS
    CAPON, DJ
    SEEBURG, PH
    MCGRATH, JP
    HAYFLICK, JS
    EDMAN, U
    LEVINSON, AD
    GOEDDEL, DV
    [J]. NATURE, 1983, 304 (5926) : 507 - 513