SMOTE-GBM: An Improved Classification Model for Early Folding Residues During Protein Folding

被引:0
|
作者
Al-Turaiki, Isra [1 ]
机构
[1] King Saud Univ, Informat Technol Dept, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
关键词
Early folding residue (EFR); Machine learning; Synthetic minority oversampling technique (SMOTE); Ensemble; Gradient boosted machine (GBM);
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Proteins are fundamental molecules that play important roles in the cell. The function and behavior of proteins are determined by their native structure. However, the protein folding process is not well understood. Machine learning algorithms have been widely used to solve bioinformatics problems. Building predictive models from early folding residues (EFRs) has recently been investigated. However, the datasets used suffer from the class imbalance problem. This renders the classification task difficult. In this paper, we address the class imbalance problem in an EFR dataset using the synthetic minority oversampling technique (SMOTE). We trained an ensemble model, the gradient boosted machine (GBM), using the balanced dataset. We then compared the performance of our trained model with that of other models in the literature. Our experimental results indicate that better classification performance is obtained when oversampling is used to overcome the class imbalance problem. In particular, better improvement was observed in terms of precision, recall, and F-measure values.
引用
收藏
页码:217 / 222
页数:6
相关论文
共 50 条
  • [1] Application of an interpretable classification model on Early Folding Residues during protein folding
    Bittrich, Sebastian
    Kaden, Marika
    Leberecht, Christoph
    Kaiser, Florian
    Villmann, Thomas
    Labudde, Dirk
    BIODATA MINING, 2019, 12 (1)
  • [2] Application of an interpretable classification model on Early Folding Residues during protein folding
    Sebastian Bittrich
    Marika Kaden
    Christoph Leberecht
    Florian Kaiser
    Thomas Villmann
    Dirk Labudde
    BioData Mining, 12
  • [4] EFFECT OF PROLINE RESIDUES ON PROTEIN FOLDING
    LEVITT, M
    JOURNAL OF MOLECULAR BIOLOGY, 1981, 145 (01) : 251 - 263
  • [5] Conserved residues and the mechanism of protein folding
    Shakhnovich, E
    Abkevich, V
    Ptitsyn, O
    NATURE, 1996, 379 (6560) : 96 - 98
  • [6] The building block folding model and the kinetics of protein folding
    Tsai, CJ
    Nussinov, R
    PROTEIN ENGINEERING, 2001, 14 (10): : 723 - 733
  • [7] Changes of protein stiffness during folding detect protein folding intermediates
    Katarzyna E. Małek
    Robert Szoszkiewicz
    Journal of Biological Physics, 2014, 40 : 15 - 23
  • [8] Early events in protein folding
    Ferguson, N
    Fersht, AR
    CURRENT OPINION IN STRUCTURAL BIOLOGY, 2003, 13 (01) : 75 - 81
  • [9] Early events in protein folding
    Sinha, Kalyan K.
    Udgaonkar, Jayant B.
    CURRENT SCIENCE, 2009, 96 (08): : 1053 - 1070
  • [10] Improved Folding Yields of a Model Protein Using Protein Disulfide Isomerase
    Chengan Du
    Jennifer M. Ye
    Janet L. Wolfe
    Pharmaceutical Research, 1998, 15 : 1808 - 1815