A Novel Computational Framework to Predict Disease-Related Copy Number Variations by Integrating Multiple Data Sources

被引:6
作者
Yuan, Lin [1 ]
Sun, Tao [1 ]
Zhao, Jing [1 ]
Shen, Zhen [2 ]
机构
[1] Qilu Univ Technol, Shandong Acad Sci, Sch Comp Sci & Technol, Jinan, Peoples R China
[2] Nanyang Inst Technol, Sch Comp & Software, Nanyang, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
CNV; multi-omics data; path association analysis; stability selection; prostate cancer; ASSOCIATION ANALYSIS; WIDE ASSOCIATION; GENE-EXPRESSION; PHOSPHORYLATION; IDENTIFICATION; REGRESSION; ARCHIVES;
D O I
10.3389/fgene.2021.696956
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Copy number variation (CNV) may contribute to the development of complex diseases. However, due to the complex mechanism of path association and the lack of sufficient samples, understanding the relationship between CNV and cancer remains a major challenge. The unprecedented abundance of CNV, gene, and disease label data provides us with an opportunity to design a new machine learning framework to predict potential disease-related CNVs. In this paper, we developed a novel machine learning approach, namely, IHI-BMLLR (Integrating Heterogeneous Information sources with Biweight Mid-correlation and L1-regularized Logistic Regression under stability selection), to predict the CNV-disease path associations by using a data set containing CNV, disease state labels, and gene data. CNVs, genes, and diseases are connected through edges and then constitute a biological association network. To construct a biological network, we first used a self-adaptive biweight mid-correlation (BM) formula to calculate correlation coefficients between CNVs and genes. Then, we used logistic regression with L1 penalty (LLR) function to detect genes related to disease. We added stability selection strategy, which can effectively reduce false positives, when using self-adaptive BM and LLR. Finally, a weighted path search algorithm was applied to find top D path associations and important CNVs. The experimental results on both simulation and prostate cancer data show that IHI-BMLLR is significantly better than two state-of-the-art CNV detection methods (i.e., CCRET and DPtest) under false-positive control. Furthermore, we applied IHI-BMLLR to prostate cancer data and found significant path associations. Three new cancer-related genes were discovered in the paths, and these genes need to be verified by biological research in the future.
引用
收藏
页数:12
相关论文
共 58 条
  • [1] The oncogene ERG: a key factor in prostate cancer
    Adamo, P.
    Ladomery, M. R.
    [J]. ONCOGENE, 2016, 35 (04) : 403 - 414
  • [2] [Anonymous], CORR
  • [3] A robust statistical method for case-control association testing with copy number variation
    Barnes, Chris
    Plagnol, Vincent
    Fitzgerald, Tomas
    Redon, Richard
    Marchini, Jonathan
    Clayton, David
    Hurles, Matthew E.
    [J]. NATURE GENETICS, 2008, 40 (10) : 1245 - 1252
  • [4] Identifying disease-associated copy number variations by a doubly penalized regression model
    Cheng, Yichen
    Dai, James Y.
    Wang, Xiaoyu
    Kooperberg, Charles
    [J]. BIOMETRICS, 2018, 74 (04) : 1341 - 1350
  • [5] Public data archives for genomic structural variation
    Church, Deanna M.
    Lappalainen, Ilkka
    Sneddon, Tam P.
    Hinton, Jonathan
    Maguire, Michael
    Lopez, John
    Garner, John
    Paschall, Justin
    DiCuccio, Michael
    Yaschenko, Eugene
    Scherer, Stephen W.
    Feuk, Lars
    Flicek, Paul
    [J]. NATURE GENETICS, 2010, 42 (10) : 813 - 814
  • [6] MeTDiff: A Novel Differential RNA Methylation Analysis for MeRIP-Seq Data
    Cui, Xiaodong
    Zhang, Lin
    Meng, Jia
    Rao, Manjeet K.
    Chen, Yidong
    Huang, Yufei
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2018, 15 (02) : 526 - 534
  • [7] A novel algorithm for calling mRNA m6A peaks by modeling biological variances in MeRIP-seq data
    Cui, Xiaodong
    Meng, Jia
    Zhang, Shaowu
    Chen, Yidong
    Huang, Yufei
    [J]. BIOINFORMATICS, 2016, 32 (12) : 378 - 385
  • [8] DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources
    Firth, Helen V.
    Richards, Shola M.
    Bevan, A. Paul
    Clayton, Stephen
    Corpas, Manuel
    Rajan, Diana
    Van Vooren, Steven
    Moreau, Yves
    Pettett, Roger M.
    Carter, Nigel P.
    [J]. AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 84 (04) : 524 - 533
  • [9] Transfer of clinically relevant gene expression signatures in breast cancer: from Affymetrix microarray to Illumina RNA-Sequencing technology
    Fumagalli, Debora
    Blanchet-Cohen, Alexis
    Brown, David
    Desmedt, Christine
    Gacquer, David
    Michiels, Stefan
    Rothe, Francoise
    Majjaj, Samira
    Salgado, Roberto
    Larsimont, Denis
    Ignatiadis, Michail
    Maetens, Marion
    Piccart, Martine
    Detours, Vincent
    Sotiriou, Christos
    Haibe-Kains, Benjamin
    [J]. BMC GENOMICS, 2014, 15
  • [10] Identification of key candidate genes and biological pathways in bladder cancer
    Gao, Xin
    Chen, Yinyi
    Chen, Mei
    Wang, Shunlan
    Wen, Xiaohong
    Zhang, Shufang
    [J]. PEERJ, 2018, 6