SMOTE-GBM: An Improved Classification Model for Early Folding Residues During Protein Folding

被引:0
|
作者
Al-Turaiki, Isra [1 ]
机构
[1] King Saud Univ, Informat Technol Dept, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
关键词
Early folding residue (EFR); Machine learning; Synthetic minority oversampling technique (SMOTE); Ensemble; Gradient boosted machine (GBM);
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Proteins are fundamental molecules that play important roles in the cell. The function and behavior of proteins are determined by their native structure. However, the protein folding process is not well understood. Machine learning algorithms have been widely used to solve bioinformatics problems. Building predictive models from early folding residues (EFRs) has recently been investigated. However, the datasets used suffer from the class imbalance problem. This renders the classification task difficult. In this paper, we address the class imbalance problem in an EFR dataset using the synthetic minority oversampling technique (SMOTE). We trained an ensemble model, the gradient boosted machine (GBM), using the balanced dataset. We then compared the performance of our trained model with that of other models in the literature. Our experimental results indicate that better classification performance is obtained when oversampling is used to overcome the class imbalance problem. In particular, better improvement was observed in terms of precision, recall, and F-measure values.
引用
收藏
页码:217 / 222
页数:6
相关论文
共 50 条
  • [31] Local interactions in protein folding determined through an inverse folding model
    Bastolla, Ugo
    Porto, Markus
    Ortiz, Angel R.
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2008, 71 (01) : 278 - 299
  • [32] Protein folding monitored at individual residues during a two-dimensional NMR experiment
    Balbach, J
    Forge, V
    Lau, WS
    vanNuland, NAJ
    Brew, K
    Dobson, CM
    SCIENCE, 1996, 274 (5290) : 1161 - 1163
  • [33] COOPERATIVE INTERACTIONS DURING PROTEIN FOLDING
    HOROVITZ, A
    FERSHT, AR
    JOURNAL OF MOLECULAR BIOLOGY, 1992, 224 (03) : 733 - 740
  • [34] Hypothetical in silico model of the early-stage intermediate in protein folding
    Kalinowska, Barbara
    Alejster, Pawe
    Salapa, Kinga
    Baster, Zbigniew
    Roterman, Irena
    JOURNAL OF MOLECULAR MODELING, 2013, 19 (10) : 4259 - 4269
  • [35] Hypothetical in silico model of the early-stage intermediate in protein folding
    Barbara Kalinowska
    Paweł Alejster
    Kinga Sałapa
    Zbigniew Baster
    Irena Roterman
    Journal of Molecular Modeling, 2013, 19 : 4259 - 4269
  • [36] LINCS AND HINGES MODEL OF PROTEIN FOLDING
    ROSE, GD
    WETLAUFER, DB
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1976, 172 (SEP3): : 6 - 6
  • [37] TOY MODEL FOR PROTEIN-FOLDING
    STILLINGER, FH
    HEADGORDON, T
    HIRSHFELD, CL
    PHYSICAL REVIEW E, 1993, 48 (02): : 1469 - 1477
  • [38] Model for the nucleation mechanism of protein folding
    Djikaev, Y. S.
    Ruckenstein, Eli
    JOURNAL OF PHYSICAL CHEMISTRY B, 2007, 111 (04): : 886 - 897
  • [39] Protein folding: Funnel model revised
    Roterman, Irena
    Slupina, Mateusz
    Konieczny, Leszek
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2024, 23 : 3827 - 3838
  • [40] DYNAMICS OF THE CLUSTER MODEL OF PROTEIN FOLDING
    KANEHISA, MI
    TSONG, TY
    BIOPOLYMERS, 1979, 18 (06) : 1375 - 1388