A Novel Computational Framework to Predict Disease-Related Copy Number Variations by Integrating Multiple Data Sources

被引:6
作者
Yuan, Lin [1 ]
Sun, Tao [1 ]
Zhao, Jing [1 ]
Shen, Zhen [2 ]
机构
[1] Qilu Univ Technol, Shandong Acad Sci, Sch Comp Sci & Technol, Jinan, Peoples R China
[2] Nanyang Inst Technol, Sch Comp & Software, Nanyang, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
CNV; multi-omics data; path association analysis; stability selection; prostate cancer; ASSOCIATION ANALYSIS; WIDE ASSOCIATION; GENE-EXPRESSION; PHOSPHORYLATION; IDENTIFICATION; REGRESSION; ARCHIVES;
D O I
10.3389/fgene.2021.696956
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Copy number variation (CNV) may contribute to the development of complex diseases. However, due to the complex mechanism of path association and the lack of sufficient samples, understanding the relationship between CNV and cancer remains a major challenge. The unprecedented abundance of CNV, gene, and disease label data provides us with an opportunity to design a new machine learning framework to predict potential disease-related CNVs. In this paper, we developed a novel machine learning approach, namely, IHI-BMLLR (Integrating Heterogeneous Information sources with Biweight Mid-correlation and L1-regularized Logistic Regression under stability selection), to predict the CNV-disease path associations by using a data set containing CNV, disease state labels, and gene data. CNVs, genes, and diseases are connected through edges and then constitute a biological association network. To construct a biological network, we first used a self-adaptive biweight mid-correlation (BM) formula to calculate correlation coefficients between CNVs and genes. Then, we used logistic regression with L1 penalty (LLR) function to detect genes related to disease. We added stability selection strategy, which can effectively reduce false positives, when using self-adaptive BM and LLR. Finally, a weighted path search algorithm was applied to find top D path associations and important CNVs. The experimental results on both simulation and prostate cancer data show that IHI-BMLLR is significantly better than two state-of-the-art CNV detection methods (i.e., CCRET and DPtest) under false-positive control. Furthermore, we applied IHI-BMLLR to prostate cancer data and found significant path associations. Three new cancer-related genes were discovered in the paths, and these genes need to be verified by biological research in the future.
引用
收藏
页数:12
相关论文
共 58 条
[1]   The oncogene ERG: a key factor in prostate cancer [J].
Adamo, P. ;
Ladomery, M. R. .
ONCOGENE, 2016, 35 (04) :403-414
[2]  
[Anonymous], Large-scale machine learning on heterogeneous systems
[3]   A robust statistical method for case-control association testing with copy number variation [J].
Barnes, Chris ;
Plagnol, Vincent ;
Fitzgerald, Tomas ;
Redon, Richard ;
Marchini, Jonathan ;
Clayton, David ;
Hurles, Matthew E. .
NATURE GENETICS, 2008, 40 (10) :1245-1252
[4]   Identifying disease-associated copy number variations by a doubly penalized regression model [J].
Cheng, Yichen ;
Dai, James Y. ;
Wang, Xiaoyu ;
Kooperberg, Charles .
BIOMETRICS, 2018, 74 (04) :1341-1350
[5]   Public data archives for genomic structural variation [J].
Church, Deanna M. ;
Lappalainen, Ilkka ;
Sneddon, Tam P. ;
Hinton, Jonathan ;
Maguire, Michael ;
Lopez, John ;
Garner, John ;
Paschall, Justin ;
DiCuccio, Michael ;
Yaschenko, Eugene ;
Scherer, Stephen W. ;
Feuk, Lars ;
Flicek, Paul .
NATURE GENETICS, 2010, 42 (10) :813-814
[6]   MeTDiff: A Novel Differential RNA Methylation Analysis for MeRIP-Seq Data [J].
Cui, Xiaodong ;
Zhang, Lin ;
Meng, Jia ;
Rao, Manjeet K. ;
Chen, Yidong ;
Huang, Yufei .
IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2018, 15 (02) :526-534
[7]   A novel algorithm for calling mRNA m6A peaks by modeling biological variances in MeRIP-seq data [J].
Cui, Xiaodong ;
Meng, Jia ;
Zhang, Shaowu ;
Chen, Yidong ;
Huang, Yufei .
BIOINFORMATICS, 2016, 32 (12) :378-385
[8]   DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources [J].
Firth, Helen V. ;
Richards, Shola M. ;
Bevan, A. Paul ;
Clayton, Stephen ;
Corpas, Manuel ;
Rajan, Diana ;
Van Vooren, Steven ;
Moreau, Yves ;
Pettett, Roger M. ;
Carter, Nigel P. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2009, 84 (04) :524-533
[9]   Transfer of clinically relevant gene expression signatures in breast cancer: from Affymetrix microarray to Illumina RNA-Sequencing technology [J].
Fumagalli, Debora ;
Blanchet-Cohen, Alexis ;
Brown, David ;
Desmedt, Christine ;
Gacquer, David ;
Michiels, Stefan ;
Rothe, Francoise ;
Majjaj, Samira ;
Salgado, Roberto ;
Larsimont, Denis ;
Ignatiadis, Michail ;
Maetens, Marion ;
Piccart, Martine ;
Detours, Vincent ;
Sotiriou, Christos ;
Haibe-Kains, Benjamin .
BMC GENOMICS, 2014, 15
[10]   Identification of key candidate genes and biological pathways in bladder cancer [J].
Gao, Xin ;
Chen, Yinyi ;
Chen, Mei ;
Wang, Shunlan ;
Wen, Xiaohong ;
Zhang, Shufang .
PEERJ, 2018, 6