Sequence-based prediction model of protein crystallization propensity using machine learning and two-level feature selection

被引:22
|
作者
Le, Nguyen Quoc Khanh [1 ]
Li, Wanru [2 ]
Cao, Yanshuang [2 ]
机构
[1] Taipei Med Univ, Coll Med, Profess Master Program Artificial Intelligence Med, Taipei 110, Taiwan
[2] Natl Univ Singapore, Inst Syst Sci, Singapore, Singapore
关键词
crystallization; feature selection; machine learning; protein sequence; prediction model; support vector machine; NETWORK;
D O I
10.1093/bib/bbad319
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein crystallization is crucial for biology, but the steps involved are complex and demanding in terms of external factors and internal structure. To save on experimental costs and time, the tendency of proteins to crystallize can be initially determined and screened by modeling. As a result, this study created a new pipeline aimed at using protein sequence to predict protein crystallization propensity in the protein material production stage, purification stage and production of crystal stage. The newly created pipeline proposed a new feature selection method, which involves combining Chi-square (chi(2)) and recursive feature elimination together with the 12 selected features, followed by a linear discriminant analysisfor dimensionality reduction and finally, a support vector machine algorithm with hyperparameter tuning and 10-fold cross-validation is used to train the model and test the results. This new pipeline has been tested on three different datasets, and the accuracy rates are higher than the existing pipelines. In conclusion, our model provides a new solution to predict multistage protein crystallization propensity which is a big challenge in computational biology.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity
    Qizhi Zhu
    Lihua Wang
    Ruyu Dai
    Wei Zhang
    Wending Tang
    Yannan Bin
    Zeliang Wang
    Junfeng Xia
    Interdisciplinary Sciences: Computational Life Sciences, 2021, 13 : 693 - 702
  • [2] Sequence-Based Prediction of Transmembrane Protein Crystallization Propensity
    Zhu, Qizhi
    Wang, Lihua
    Dai, Ruyu
    Zhang, Wei
    Tang, Wending
    Bin, Yannan
    Wang, Zeliang
    Xia, Junfeng
    INTERDISCIPLINARY SCIENCES-COMPUTATIONAL LIFE SCIENCES, 2021, 13 (04) : 693 - 702
  • [3] CRYSpred: Accurate Sequence-Based Protein Crystallization Propensity Prediction Using Sequence-Derived Structural Characteristics
    Mizianty, Marcin J.
    Kurgan, Lukasz A.
    PROTEIN AND PEPTIDE LETTERS, 2012, 19 (01) : 40 - 49
  • [4] Sequence-based analysis and prediction of lantibiotics: A machine learning approach
    Poorinmohammad, Naghmeh
    Hamedi, Javad
    Moghaddam, Mohammad Hossein Abbaspour Motlagh
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2018, 77 : 199 - 206
  • [5] A Gas Emission Prediction Model Based on Feature Selection and Improved Machine Learning
    Shao, Liangshan
    Zhang, Kun
    PROCESSES, 2023, 11 (03)
  • [6] Sequence-Based Prediction of Protein-Peptide Binding Sites Using Support Vector Machine
    Taherzadeh, Ghazaleh
    Yang, Yuedong
    Zhang, Tuo
    Liew, Alan Wee-Chung
    Zhou, Yaoqi
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2016, 37 (13) : 1223 - 1229
  • [7] Two-level feature selection method based on SVM for intrusion detection
    Wu, Xiao-Nian
    Peng, Xiao-Jin
    Yang, Yu-Yang
    Fang, Kun
    Tongxin Xuebao/Journal on Communications, 2015, 36 (04):
  • [8] Using Machine Learning and Feature Selection for Alfalfa Yield Prediction
    Whitmire, Christopher D. D.
    Vance, Jonathan M. M.
    Rasheed, Hend K. K.
    Missaoui, Ali
    Rasheed, Khaled M. M.
    Maier, Frederick W. W.
    AI, 2021, 2 (01) : 71 - 88
  • [9] Sequence-Based Classification Using Discriminatory Motif Feature Selection
    Xiong, Hao
    Capurso, Daniel
    Sen, Saunak
    Segal, Mark R.
    PLOS ONE, 2011, 6 (11):
  • [10] Crowdfunding performance prediction using feature-selection-based machine learning models
    Feng, Yuanyue
    Luo, Yuhong
    Peng, Nianjiao
    Niu, Ben
    EXPERT SYSTEMS, 2024, 41 (10)