XGBFEMF: An XGBoost-Based Framework for Essential Protein Prediction

被引:101
|
作者
Zhong, Jiancheng [1 ]
Sun, Yusui [1 ]
Peng, Wei [2 ]
Xie, Minzhu [1 ]
Yang, Jiahong [1 ]
Tang, Xiwei [3 ]
机构
[1] Hunan Normal Univ, Sch Informat Sci & Engn, Changsha 410081, Hunan, Peoples R China
[2] Kunming Univ Sci & Technol, Comp Ctr, Kunming 650050, Yunnan, Peoples R China
[3] Hunan First Normal Univ, Dept Informat Sci & Engn, Changsha 410205, Hunan, Peoples R China
基金
中国国家自然科学基金;
关键词
Essential protein; feature engineering; multi-model fusion; XGBoost; SUB-EXPAND-SHRINK; XGBFEMF; ESSENTIAL GENES; SUBCELLULAR-LOCALIZATION; CENTRALITY; NETWORKS; DATABASE; GENOME; IDENTIFICATION; BETWEENNESS;
D O I
10.1109/TNB.2018.2842219
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Essential proteins as a vital part of maintaining the cells' life play an important role in the study of biology and drug design. With the generation of large amounts of biological data related to essential proteins, an increasing number of computational methods have been proposed. Different from the methods which adopt a single machine learning method or an ensemble machine learning method, this paper proposes a predicting framework named by XGBFEMF for identifying essential proteins, which includes a SUB-EXPAND-SHRINK method for constructing the composite features with original features and obtaining the better subset of features for essential protein prediction, and also includes a model fusion method for getting a more effective prediction model. We carry out experiments on Yeast data to assess the performance of the XGBFEMF with ROC analysis, accuracy analysis, and top analysis. Meanwhile, we set up experiments on E. coli data for the validation of performance. The test results show that the XGBFEMF framework can effectively improve many essential indicators. In addition, we analyze each step in the XGBFEMF framework; our results show that both each step of the SUB-EXPAND-SHRINK method as well as the step of multi-model fusion can improve prediction performance.
引用
收藏
页码:243 / 250
页数:8
相关论文
共 50 条
  • [31] XGBoost-Based Intelligent Decision Making of HVDC System with Knowledge Graph
    Li, Qiang
    Chen, Qian
    Wu, Jiyang
    Qiu, Youqiang
    Zhang, Changhong
    Huang, Yilong
    Guo, Jianbao
    Yang, Bo
    ENERGIES, 2023, 16 (05)
  • [32] Speech-Based Parkinson’s Disease Prediction Using XGBoost-Based Features Selection and the Stacked Ensemble of Classifiers
    Karan B.
    Journal of The Institution of Engineers (India): Series B, 2023, 104 (02) : 475 - 483
  • [33] Pipeline Stress Test Simulation Under Freeze-Thaw Cycling via the XGBoost-Based Prediction Model
    Teng, Zhen-Chao
    Teng, Yun-Chao
    Li, Bo
    Liu, Xiao-Yan
    Liu, Yu
    Zhou, Ya-Dong
    FRONTIERS IN EARTH SCIENCE, 2022, 10
  • [34] Essential Protein Discovery based on Network Motif and Gene Ontology
    Kim, Wooyoung
    Li, Min
    Wang, Jianxin
    Pan, Yi
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM 2011), 2011, : 470 - 475
  • [35] UDoNC: An Algorithm for Identifying Essential Proteins Based on Protein Domains and Protein-Protein Interaction Networks
    Peng, Wei
    Wang, Jianxin
    Cheng, Yingjiao
    Lu, Yu
    Wu, Fangxiang
    Pan, Yi
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (02) : 276 - 288
  • [36] An XGBoost-based model for assessment of aortic stiffness from wrist photoplethysmogram
    Li, Yunlong
    Xu, Yang
    Ma, Zuchang
    Ye, Yuqi
    Gao, Lisheng
    Sun, Yining
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 226
  • [37] A XGBoost-Based Downscaling-Calibration Scheme for Extreme Precipitation Events
    Zhu, Honglin
    Liu, Huizeng
    Zhou, Qiming
    Cui, Aihong
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [38] XGBoost-based anomaly detection for quality management of spent fuel safety information
    Sim, Ga-Hee
    Park, Moon-Ghu
    Cha, Kyoon-Ho
    Yoo, Youngjin
    Kim, Yongdeog
    Lee, Donghee
    JOURNAL OF NUCLEAR SCIENCE AND TECHNOLOGY, 2024,
  • [39] Essential protein identification based on essential protein-protein interaction prediction by Integrated Edge Weights
    Jiang, Yuexu
    Wang, Yan
    Pang, Wei
    Chen, Liang
    Sun, Huiyan
    Liang, Yanchun
    Blanzieri, Enrico
    METHODS, 2015, 83 : 51 - 62
  • [40] Hybrid classification of XGBoost-based ADAM optimization for coronary artery disease diagnosis
    Nagamani T.
    Logeswari S.
    Journal of Intelligent and Fuzzy Systems, 2024, 46 (04) : 10035 - 10044