Incorporating prior information in gene expression network-based cancer heterogeneity analysis

被引:0
作者
Li, Rong [1 ]
Xu, Shaodong [2 ,3 ]
Li, Yang [2 ,3 ]
Tang, Zuojian [4 ]
Feng, Di [4 ]
Cai, James [4 ]
Ma, Shuangge [1 ]
机构
[1] Yale Sch Publ Hlth, Dept Biostat, 60 Coll St, New Haven, CT 06511 USA
[2] Renmin Univ China, Ctr Appl Stat, Sch Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
[3] Renmin Univ China, Sch Stat, 59 Zhongguancun St, Beijing 100872, Peoples R China
[4] Boehringer Ingelheim Pharmaceut Inc, Global Computat Biol & Digital Sci, 900 Ridgebury Rd, Ridgefield, CT 06877 USA
基金
中国国家自然科学基金;
关键词
gene expression network; heterogeneity analysis; prior information; regulation; MODEL;
D O I
10.1093/biostatistics/kxae028
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Cancer is molecularly heterogeneous, with seemingly similar patients having different molecular landscapes and accordingly different clinical behaviors. In recent studies, gene expression networks have been shown as more effective/informative for cancer heterogeneity analysis than some simpler measures. Gene interconnections can be classified as "direct" and "indirect," where the latter can be caused by shared genomic regulators (such as transcription factors, microRNAs, and other regulatory molecules) and other mechanisms. It has been suggested that incorporating the regulators of gene expressions in network analysis and focusing on the direct interconnections can lead to a deeper understanding of the more essential gene interconnections. Such analysis can be seriously challenged by the large number of parameters (jointly caused by network analysis, incorporation of regulators, and heterogeneity) and often weak signals. To effectively tackle this problem, we propose incorporating prior information contained in the published literature. A key challenge is that such prior information can be partial or even wrong. We develop a two-step procedure that can flexibly accommodate different levels of prior information quality. Simulation demonstrates the effectiveness of the proposed approach and its superiority over relevant competitors. In the analysis of a breast cancer dataset, findings different from the alternatives are made, and the identified sample subgroups have important clinical differences.
引用
收藏
页数:16
相关论文
共 35 条
  • [1] STATISTICAL GUARANTEES FOR THE EM ALGORITHM: FROM POPULATION TO SAMPLE-BASED ANALYSIS
    Balakrishnan, Sivaraman
    Wainwrightt, Martin J.
    Yu, Bin
    [J]. ANNALS OF STATISTICS, 2017, 45 (01) : 77 - 120
  • [2] Methods for the integration of multi-omics data: mathematical aspects
    Bersanelli, Matteo
    Mosca, Ettore
    Remondini, Daniel
    Giampieri, Enrico
    Sala, Claudia
    Castellani, Gastone
    Milanesi, Luciano
    [J]. BMC BIOINFORMATICS, 2016, 17
  • [3] Boyd S., 2011, FOUND TRENDS MACH LE, V3, P1, DOI DOI 10.1561/2200000016
  • [4] Gene expression patterns unveil a new level of molecular heterogeneity in colorectal cancer
    Budinska, Eva
    Popovici, Vlad
    Tejpar, Sabine
    D'Ario, Giovanni
    Lapique, Nicolas
    Sikora, Katarzyna Otylia
    Di Narzo, Antonio Fabio
    Yan, Pu
    Hodgson, John Graeme
    Weinrich, Scott
    Bosman, Fred
    Roth, Arnaud
    Delorenzi, Mauro
    [J]. JOURNAL OF PATHOLOGY, 2013, 231 (01) : 63 - 76
  • [5] The causes and consequences of genetic heterogeneity in cancer evolution
    Burrell, Rebecca A.
    McGranahan, Nicholas
    Bartek, Jiri
    Swanton, Charles
    [J]. NATURE, 2013, 501 (7467) : 338 - 345
  • [6] Nonparametric mixture models with conditionally independent multivariate component densities
    Chauveau, Didier
    Vy Thuy Lynh Hoang
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2016, 103 : 1 - 16
  • [7] Investigating skewness to understand gene expression heterogeneity in large patient cohorts
    Church, Benjamin, V
    Williams, Henry T.
    Mar, Jessica C.
    [J]. BMC BIOINFORMATICS, 2019, 20 (01)
  • [8] The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups
    Curtis, Christina
    Shah, Sohrab P.
    Chin, Suet-Feung
    Turashvili, Gulisa
    Rueda, Oscar M.
    Dunning, Mark J.
    Speed, Doug
    Lynch, Andy G.
    Samarajiwa, Shamith
    Yuan, Yinyin
    Graef, Stefan
    Ha, Gavin
    Haffari, Gholamreza
    Bashashati, Ali
    Russell, Roslin
    McKinney, Steven
    Langerod, Anita
    Green, Andrew
    Provenzano, Elena
    Wishart, Gordon
    Pinder, Sarah
    Watson, Peter
    Markowetz, Florian
    Murphy, Leigh
    Ellis, Ian
    Purushotham, Arnie
    Borresen-Dale, Anne-Lise
    Brenton, James D.
    Tavare, Simon
    Caldas, Carlos
    Aparicio, Samuel
    [J]. NATURE, 2012, 486 (7403) : 346 - 352
  • [9] Fan J, 2016, NAT METHODS, V13, P241, DOI [10.1038/NMETH.3734, 10.1038/nmeth.3734]
  • [10] Hao BT, 2018, J MACH LEARN RES, V18