MSINGB: A Novel Computational Method Based on NGBoost for Identifying Microsatellite Instability Status from Tumor Mutation Annotation Data

被引:5
作者
Chen, Jinxiang [1 ]
Wang, Miao [1 ]
Zhao, Defeng [1 ]
Li, Fuyi [2 ]
Wu, Hao [3 ]
Liu, Quanzhong [1 ]
Li, Shuqin [1 ]
机构
[1] Northwest A&F Univ, Coll Informat Engn, Yangling 712100, Shanxi, Peoples R China
[2] Univ Melbourne, Peter Doherty Inst Infect & Immun, Dept Microbiol & Immunol, 792 Elizabeth St, Melbourne, Vic 3000, Australia
[3] Shandong Univ, Sch Software, Jinan 250100, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
Microsatellite instability; Machine learning; NGBoost; Feature selection; COLORECTAL-CANCER; CLASSIFICATION;
D O I
10.1007/s12539-022-00544-w
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Microsatellite instability (MSI), a vital mutator phenotype caused by DNA mismatch repair deficiency, is frequently observed in several tumors. MSI is recognized as a critical molecular biomarker for diagnosis, prognosis, and therapeutic selection in several cancers. Identifying MSI status for current gold standard methods based on experimental analysis is laborious, time-consuming, and costly. Although several computational methods based on machine learning have been proposed to identify MSI status, we need to further understand which machine learning model would favor identification for MSI and which feature subset is strongly related to MSI. On this basis, more effective machine learning-based methods can be developed to improve the performance of MSI status identification. In this work, we present MSINGB, an NGBoost-based method for identifying MSI status from tumor somatic mutation annotation data. MSINGB first evaluates the prediction performance of 11 popular machine learning algorithms and 9 deep learning models to identify MSI. Among 20 models, NGBoost, a novel natural gradient boosting method, achieves the overall best performance. MSINGB then introduces two feature selection strategies to find the compact feature subset, which is strongly related to MSI, and employs the SHAP approach to interpreting how selected features impact the model prediction. MSINGB achieves a better prediction performance on both the tenfold cross-validation test and independent test compared with state-of-the-art methods. [GRAPHICS] .
引用
收藏
页码:100 / 110
页数:11
相关论文
共 55 条
  • [51] Microsatellite instability in cancer: a novel landscape for diagnostic and therapeutic approach
    Yamamoto, Hiroyuki
    Watanabe, Yoshiyuki
    Maehata, Tadateru
    Imai, Kohzoh
    Itoh, Fumio
    [J]. ARCHIVES OF TOXICOLOGY, 2020, 94 (10) : 3349 - 3357
  • [52] PredGly: predicting lysine glycation sites for Homo sapiens based on XGboost feature optimization
    Yu, Jialin
    Shi, Shaoping
    Zhang, Fang
    Chen, Guodong
    Cao, Man
    [J]. BIOINFORMATICS, 2019, 35 (16) : 2749 - 2756
  • [53] MSIFinder: a python']python package for detecting MSI status using random forest classifier
    Zhou, Tao
    Chen, Libin
    Guo, Jing
    Zhang, Mengmeng
    Zhang, Yanrui
    Cao, Shanbo
    Lou, Feng
    Wang, Haijun
    [J]. BMC BIOINFORMATICS, 2021, 22 (01)
  • [54] Zhu J, 2009, STAT INTERFACE, V2, P349
  • [55] Zizka J, 2019, TEXT MINING MACHINE, P193, DOI [10.1201/9780429469275-8, DOI 10.1201/9780429469275-8]