DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies

被引:104
作者
Han, Yi [1 ,2 ]
Yang, Juze [2 ,3 ]
Qian, Xinyi [2 ,3 ]
Cheng, Wei-Chung [4 ,5 ]
Liu, Shu-Hsuan [4 ,5 ]
Hua, Xing [6 ]
Zhou, Liyuan [2 ,3 ]
Yang, Yaning [7 ]
Wu, Qingbiao [8 ]
Liu, Pengyuan [2 ,3 ]
Lu, Yan [1 ,2 ]
机构
[1] Zhejiang Univ, Sch Med, Womens Hosp, Womens Reprod Hlth Key Lab Zhejiang Prov,Ctr Uter, Hangzhou 310006, Zhejiang, Peoples R China
[2] Zhejiang Univ, Sch Med, Inst Translat Med, Hangzhou 310006, Zhejiang, Peoples R China
[3] Zhejiang Univ, Sch Med, Sir Run Run Shaw Hosp, Hangzhou 310016, Zhejiang, Peoples R China
[4] China Med Univ, Res Ctr Tumor Med Sci, Grad Inst Biomed Sci, Taichung 40402, Taiwan
[5] China Med Univ, Drug Dev Ctr, Taichung 40402, Taiwan
[6] NCI, Div Canc Epidemiol & Genet, NIH, Bethesda, MD 20892 USA
[7] Univ Sci & Technol China, Dept Stat & Finance, Hefei 230026, Anhui, Peoples R China
[8] Zhejiang Univ, Dept Math, Hangzhou 310027, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
SOMATIC MUTATIONS; DISCOVERY; PATHWAYS; PATTERNS; DATABASE; GENOMES; POWER; NPAT;
D O I
10.1093/nar/gkz096
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Although rapid progress has been made in computational approaches for prioritizing cancer driver genes, research is far from achieving the ultimate goal of discovering a complete catalog of genes truly associated with cancer. Driver gene lists predicted from these computational tools lack consistency and are prone to false positives. Here, we developed an approach (DriverML) integrating Rao's score test and supervised machine learning to identify cancer driver genes. The weight parameters in the score statistics quantified the functional impacts of mutations on the protein. To obtain optimized weight parameters, the score statistics of prior driver genes were maximized on pan-cancer training data. We conducted rigorous and unbiased benchmark analysis and comparisons of DriverML with 20 other existing tools in 31 independent datasets from The Cancer Genome Atlas (TCGA). Our comprehensive evaluations demonstrated that DriverML was robust and powerful among various datasets and outperformed the other tools with a better balance of precision and sensitivity. In vitro cell-based assays further proved the validity of the DriverML prediction of novel driver genes. In summary, DriverML uses an innovative, machine learning-based approach to prioritize cancer driver genes and provides dramatic improvements over currently existing methods.
引用
收藏
页数:12
相关论文
共 50 条
[1]   DriverNet: uncovering the impact of somatic driver mutations on transcriptional networks in cancer [J].
Bashashati, Ali ;
Haffari, Gholamreza ;
Ding, Jiarui ;
Ha, Gavin ;
Lui, Kenneth ;
Rosner, Jamie ;
Huntsman, David G. ;
Caldas, Carlos ;
Aparicio, Samuel A. ;
Shah, Sohrab P. .
GENOME BIOLOGY, 2012, 13 (12) :R124
[2]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[3]   Functional interaction of histone deacetylase 5 (HDAC5) and lysine-specific demethylase 1 (LSD1) promotes breast cancer progression [J].
Cao, C. ;
Vasilatos, S. N. ;
Bhargava, R. ;
Fine, J. L. ;
Oesterreich, S. ;
Davidson, N. E. ;
Huang, Y. .
ONCOGENE, 2017, 36 (01) :133-145
[4]   Cancer-Specific High-Throughput Annotation of Somatic Mutations: Computational Prediction of Driver Missense Mutations [J].
Carter, Hannah ;
Chen, Sining ;
Isik, Leyla ;
Tyekucheva, Svitlana ;
Velculescu, Victor E. ;
Kinzler, Kenneth W. ;
Vogelstein, Bert ;
Karchin, Rachel .
CANCER RESEARCH, 2009, 69 (16) :6660-6667
[5]   Automated Network Analysis Identifies Core Pathways in Glioblastoma [J].
Cerami, Ethan ;
Demir, Emek ;
Schultz, Nikolaus ;
Taylor, Barry S. ;
Sander, Chris .
PLOS ONE, 2010, 5 (02)
[6]   Deacetylation of HSPA5 by HDAC6 leads to GP78-mediated HSPA5 ubiquitination at K447 and suppresses metastasis of breast cancer [J].
Chang, Y-W ;
Tseng, C-F ;
Wang, M-Y ;
Chang, W-C ;
Lee, C-C ;
Chen, L-T ;
Hung, M-C ;
Su, J-L .
ONCOGENE, 2016, 35 (12) :1517-1528
[7]   DriverDB: an exome sequencing database for cancer driver gene identification [J].
Cheng, Wei-Chung ;
Chung, I-Fang ;
Chen, Chen-Yang ;
Sun, Hsing-Jen ;
Fen, Jun-Jeng ;
Tang, Wei-Chun ;
Chang, Ting-Yu ;
Wong, Tai-Tong ;
Wang, Hsei-Wei .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D1048-D1054
[8]   DriverDBv2: a database for human cancer driver gene research [J].
Chung, I-Fang ;
Chen, Chen-Yang ;
Su, Shih-Chieh ;
Li, Chia-Yang ;
Wu, Kou-Juey ;
Wang, Hsei-Wei ;
Cheng, Wei-Chung .
NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) :D975-D979
[9]   Mutual exclusivity analysis identifies oncogenic network modules [J].
Ciriello, Giovanni ;
Cerami, Ethan ;
Sander, Chris ;
Schultz, Nikolaus .
GENOME RESEARCH, 2012, 22 (02) :398-406
[10]   Activation of mammalian target of rapamycin complex 1 (mTORC1) and Raf/Pyk2 by growth factor-mediated Eph receptor 2 (EphA2) is required for cholangiocarcinoma growth and metastasis [J].
Cui, Xiang-Dan ;
Lee, Mi-Jin ;
Kim, Jong-Hyun ;
Hao, Pei-Pei ;
Liu, Lan ;
Yu, Goung-Ran ;
Kim, Dae-Ghon .
HEPATOLOGY, 2013, 57 (06) :2248-2260