SMOTE-GBM: An Improved Classification Model for Early Folding Residues During Protein Folding

被引:0
|
作者
Al-Turaiki, Isra [1 ]
机构
[1] King Saud Univ, Informat Technol Dept, Coll Comp & Informat Sci, Riyadh, Saudi Arabia
关键词
Early folding residue (EFR); Machine learning; Synthetic minority oversampling technique (SMOTE); Ensemble; Gradient boosted machine (GBM);
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Proteins are fundamental molecules that play important roles in the cell. The function and behavior of proteins are determined by their native structure. However, the protein folding process is not well understood. Machine learning algorithms have been widely used to solve bioinformatics problems. Building predictive models from early folding residues (EFRs) has recently been investigated. However, the datasets used suffer from the class imbalance problem. This renders the classification task difficult. In this paper, we address the class imbalance problem in an EFR dataset using the synthetic minority oversampling technique (SMOTE). We trained an ensemble model, the gradient boosted machine (GBM), using the balanced dataset. We then compared the performance of our trained model with that of other models in the literature. Our experimental results indicate that better classification performance is obtained when oversampling is used to overcome the class imbalance problem. In particular, better improvement was observed in terms of precision, recall, and F-measure values.
引用
收藏
页码:217 / 222
页数:6
相关论文
共 50 条
  • [41] Protein folding model focuses on 'designability'
    Borman, S
    CHEMICAL & ENGINEERING NEWS, 1996, 74 (33) : 36 - 36
  • [42] A Network Approach To Model Protein Folding
    Grant, Terri
    Greene, Lesley
    PROTEIN SCIENCE, 2012, 21 : 226 - 227
  • [43] LYSOZYME - A MODEL PROTEIN FOR FOLDING STUDIES
    RADFORD, SE
    FASEB JOURNAL, 1995, 9 (06): : A1257 - A1257
  • [44] Polymorphic BCO for protein folding model
    Zhang, Yudong
    Huo, Yuankai
    Zhu, Qing
    Wang, Shuihua
    Wu, Lenan
    Journal of Computational Information Systems, 2010, 6 (06): : 1787 - 1794
  • [45] Folding studies on ribonuclease A, a model protein
    Neira, JL
    Rico, M
    FOLDING & DESIGN, 1997, 2 (01): : R1 - R11
  • [46] Statistical Mechanics Model for Protein Folding
    Yakubovich, Alexander
    Solov'yov, Andrey V.
    Greiner, Walter
    ISACC 2009: FOURTH INTERNATIONAL SYMPOSIUM ON ATOMIC CLUSTER COLLISIONS: STRUCTURE AND DYNAMICS FROM THE NUCLEAR TO THE BIOLOGICAL SCALE, 2009, 1197 : 186 - 200
  • [47] PROTEIN FOLDING - GENERAL PHYSICAL MODEL
    PTITSYN, OB
    FEBS LETTERS, 1981, 131 (02) : 197 - 202
  • [48] A PEPTIDE MODEL OF A PROTEIN FOLDING INTERMEDIATE
    OAS, TG
    KIM, PS
    NATURE, 1988, 336 (6194) : 42 - 48
  • [49] Study of a model for the folding of a small protein
    Nobile, A
    Rapuano, F
    JOURNAL OF PHYSICS-CONDENSED MATTER, 2006, 18 (24) : 5687 - 5694
  • [50] Early collapse is not an obligate step in protein folding
    Jacob, J
    Krantz, B
    Dothager, RS
    Thiyagarajan, P
    Sosnick, TR
    JOURNAL OF MOLECULAR BIOLOGY, 2004, 338 (02) : 369 - 382