SMOTE-GBM: An Improved Classification Model for Early Folding Residues During Protein Folding

被引:0
|
作者
Al-Turaiki, Isra [1 ]
机构
[1] King Saud Univ, Informat Technol Dept, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
关键词
Early folding residue (EFR); Machine learning; Synthetic minority oversampling technique (SMOTE); Ensemble; Gradient boosted machine (GBM);
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Proteins are fundamental molecules that play important roles in the cell. The function and behavior of proteins are determined by their native structure. However, the protein folding process is not well understood. Machine learning algorithms have been widely used to solve bioinformatics problems. Building predictive models from early folding residues (EFRs) has recently been investigated. However, the datasets used suffer from the class imbalance problem. This renders the classification task difficult. In this paper, we address the class imbalance problem in an EFR dataset using the synthetic minority oversampling technique (SMOTE). We trained an ensemble model, the gradient boosted machine (GBM), using the balanced dataset. We then compared the performance of our trained model with that of other models in the literature. Our experimental results indicate that better classification performance is obtained when oversampling is used to overcome the class imbalance problem. In particular, better improvement was observed in terms of precision, recall, and F-measure values.
引用
收藏
页码:217 / 222
页数:6
相关论文
共 50 条
  • [21] A SIMPLE MODEL FOR PROTEIN FOLDING
    Henry, Eric R.
    Eaton, William A.
    BIOPHYSICS AND THE CHALLENGES OF EMERGING THREATS, 2009, : 1 - 20
  • [22] APAMIN AS A MODEL FOR PROTEIN FOLDING
    NELSON, JW
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1988, 195 : 7 - PHYS
  • [23] Effect of gatekeepers on the early folding kinetics of a model β-barrel protein
    Stoycheva, AD
    Onuchic, JN
    Brooks, CL
    JOURNAL OF CHEMICAL PHYSICS, 2003, 119 (11): : 5722 - 5729
  • [24] Quantification of Drive-Response Relationships Between Residues During Protein Folding
    Qi, Yifei
    Im, Wonpil
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2013, 9 (08) : 3799 - 3805
  • [25] Effect of ionized protein residues on the nucleation pathway of protein folding
    Djikaev, Y. S.
    Ruckenstein, Eli
    JOURNAL OF CHEMICAL PHYSICS, 2008, 128 (02):
  • [26] Early events in protein folding.
    Eaton, WA
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1997, 213 : 208 - PHYS
  • [27] Roles of proline residues in the structure and folding of a β-clam protein
    Eyles, SJ
    Habink, JA
    Gunasekaran, K
    Gierasch, LM
    PEPTIDES FOR THE NEW MILLENNIUM, 2000, : 313 - 315
  • [28] Water dynamics clue to key residues in protein folding
    Gao, Meng
    Zhu, Huaiqiu
    Yao, Xin-Qiu
    She, Zhen-Su
    BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2010, 392 (01) : 95 - 99
  • [29] An Effective Cumulative Torsion Folding Model for Prediction of Protein Folding Rates
    Li, Yanru
    Zhang, Ying
    Lv, Jun
    PROTEIN AND PEPTIDE LETTERS, 2020, 27 (04): : 321 - 328
  • [30] Protein Folding Classification by Committee SVM Array
    Takata, Mika
    Matsuyama, Yasuo
    ADVANCES IN NEURO-INFORMATION PROCESSING, PT II, 2009, 5507 : 369 - 377