PredCID: prediction of driver frameshift indels in human cancer

被引:25
作者
Yue, Zhenyu [1 ]
Chu, Xinlu [2 ]
Xia, Junfeng [3 ]
机构
[1] Anhui Agr Univ, Sch Informat & Comp, Hefei, Peoples R China
[2] Anhui Univ, Inst Phys Sci & Informat Technol, Hefei, Peoples R China
[3] Anhui Univ, Key Lab Intelligent Comp & Signal Proc, Minist Educ, Inst Phys Sci & Informat Technol, Hefei 230601, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
cancer; driver mutation; frameshift indel; machine learning; PATHOGENICITY; NUCLEOTIDE; MUTATIONS; SEQUENCE; FEATURES; DATABASE; GENOME;
D O I
10.1093/bib/bbaa119
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The discrimination of driver from passenger mutations has been a hot topic in the field of cancer biology. Although recent advances have improved the identification of driver mutations in cancer genomic research, there is no computational method specific for the cancer frameshift indels (insertions or/and deletions) yet. In addition, existing pathogenic frameshift indel predictors may suffer from plenty of missing values because of different choices of transcripts during the variant annotation processes. In this study, we proposed a computational model, called PredCID (Predictor for Cancer driver frameshift InDels), for accurately predicting cancer driver frameshift indels. Gene, DNA, transcript and protein level features are combined together and selected for classification with eXtreme Gradient Boosting classifier. Benchmarking results on the cross-validation dataset and independent dataset showed that PredCID achieves better and robust performance compared with existing noncancer-specific methods in distinguishing cancer driver frameshift indels from passengers and is therefore a valuable method for deeper understanding of frameshift indels in human cancer. PredCID is freely available for academic research at http://bioinfo.ahu.edu.cn:8080/PredCID.
引用
收藏
页数:9
相关论文
共 35 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]   Cancer-Specific High-Throughput Annotation of Somatic Mutations: Computational Prediction of Driver Missense Mutations [J].
Carter, Hannah ;
Chen, Sining ;
Isik, Leyla ;
Tyekucheva, Svitlana ;
Velculescu, Victor E. ;
Kinzler, Kenneth W. ;
Vogelstein, Bert ;
Karchin, Rachel .
CANCER RESEARCH, 2009, 69 (16) :6660-6667
[3]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[4]   Large-scale comparative assessment of computational predictors for lysine post-translational modification sites [J].
Chen, Zhen ;
Liu, Xuhan ;
Li, Fuyi ;
Li, Chen ;
Marquez-Lago, Tatiana ;
Leier, Andre ;
Akutsu, Tatsuya ;
Webb, Geoffrey, I ;
Xu, Dakang ;
Smith, Alexander Ian ;
Li, Lei ;
Chou, Kuo-Chen ;
Song, Jiangning .
BRIEFINGS IN BIOINFORMATICS, 2019, 20 (06) :2267-2290
[5]   Comparison and integration of computational methods for deleterious synonymous mutation prediction [J].
Cheng, Na ;
Li, Menglu ;
Zhao, Le ;
Zhang, Bo ;
Yang, Yuhua ;
Zheng, Chun-Hou ;
Xia, Junfeng .
BRIEFINGS IN BIOINFORMATICS, 2020, 21 (03) :970-981
[6]   A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 [J].
Cingolani, Pablo ;
Platts, Adrian ;
Wang, Le Lily ;
Coon, Melissa ;
Tung Nguyen ;
Wang, Luan ;
Land, Susan J. ;
Lu, Xiangyi ;
Ruden, Douglas M. .
FLY, 2012, 6 (02) :80-92
[7]  
Davis J., 2006, P 23 INT C MACH LEAR, P233, DOI DOI 10.1145/1143844.1143874
[8]   Assessing the Pathogenicity of Insertion and Deletion Variants with the Variant Effect Scoring Tool (VEST-Indel) [J].
Douville, Christopher ;
Masica, David L. ;
Stenson, Peter D. ;
Cooper, David N. ;
Gygax, Derek M. ;
Kim, Rick ;
Ryan, Michael ;
Karchin, Rachel .
HUMAN MUTATION, 2016, 37 (01) :28-35
[9]   The Pfam protein families database: towards a more sustainable future [J].
Finn, Robert D. ;
Coggill, Penelope ;
Eberhardt, Ruth Y. ;
Eddy, Sean R. ;
Mistry, Jaina ;
Mitchell, Alex L. ;
Potter, Simon C. ;
Punta, Marco ;
Qureshi, Matloob ;
Sangrador-Vegas, Amaia ;
Salazar, Gustavo A. ;
Tate, John ;
Bateman, Alex .
NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) :D279-D285
[10]   DDIG-in: detecting disease-causing genetic variations due to frameshifting indels and nonsense mutations employing sequence and structural properties at nucleotide and protein levels [J].
Folkman, Lukas ;
Yang, Yuedong ;
Li, Zhixiu ;
Stantic, Bela ;
Sattar, Abdul ;
Mort, Matthew ;
Cooper, David N. ;
Liu, Yunlong ;
Zhou, Yaoqi .
BIOINFORMATICS, 2015, 31 (10) :1599-1606