Sequence-based prediction model of protein crystallization propensity using machine learning and two-level feature selection

被引:22
作者
Le, Nguyen Quoc Khanh [1 ]
Li, Wanru [2 ]
Cao, Yanshuang [2 ]
机构
[1] Taipei Med Univ, Coll Med, Profess Master Program Artificial Intelligence Med, Taipei 110, Taiwan
[2] Natl Univ Singapore, Inst Syst Sci, Singapore, Singapore
关键词
crystallization; feature selection; machine learning; protein sequence; prediction model; support vector machine; NETWORK;
D O I
10.1093/bib/bbad319
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein crystallization is crucial for biology, but the steps involved are complex and demanding in terms of external factors and internal structure. To save on experimental costs and time, the tendency of proteins to crystallize can be initially determined and screened by modeling. As a result, this study created a new pipeline aimed at using protein sequence to predict protein crystallization propensity in the protein material production stage, purification stage and production of crystal stage. The newly created pipeline proposed a new feature selection method, which involves combining Chi-square (chi(2)) and recursive feature elimination together with the 12 selected features, followed by a linear discriminant analysisfor dimensionality reduction and finally, a support vector machine algorithm with hyperparameter tuning and 10-fold cross-validation is used to train the model and test the results. This new pipeline has been tested on three different datasets, and the accuracy rates are higher than the existing pipelines. In conclusion, our model provides a new solution to predict multistage protein crystallization propensity which is a big challenge in computational biology.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Sequence-based protein-protein interaction prediction via support vector machine
    Wang, Yongcui
    Wang, Jiguang
    Yang, Zhixia
    Deng, Naiyang
    JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2010, 23 (05) : 1012 - 1023
  • [32] Ensemble Learning-Based Feature Selection for Phage Protein Prediction
    Liu, Songbo
    Cui, Chengmin
    Chen, Huipeng
    Liu, Tong
    FRONTIERS IN MICROBIOLOGY, 2022, 13
  • [33] A Comparative Study for Breast Cancer Prediction using Machine Learning and Feature Selection
    Dhanya, R.
    Paul, Irene Rose
    Akula, Sai Sindhu
    Sivakumar, Madhumathi
    Nair, Jyothisha J.
    PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICCS), 2019, : 1049 - 1055
  • [34] Congestive heart failure prediction based on feature selection and machine learning algorithms
    Morillo-Velepucha, Diego
    Reategui, Ruth
    Valdiviezo-Diaz, Priscila
    Barba-Guaman, Luis
    2022 17TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2022,
  • [35] Design of a Predictor Model for Feature Selection using Machine Learning Approaches
    Pradeep, P.
    Kamalakannan, J.
    JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (05) : 2359 - 2373
  • [36] Antiprotozoal peptide prediction using machine learning with effective feature selection techniques
    Periwal, Neha
    Arora, Pooja
    Thakur, Ananya
    Agrawal, Lakshay
    Goyal, Yash
    Rathore, Anand S.
    Anand, Harsimrat Singh
    Kaur, Baljeet
    Sood, Vikas
    HELIYON, 2024, 10 (16)
  • [37] PREDICTION OF TYPE 2 DIABETES MELLITUS USING FEATURE SELECTION-BASED MACHINE LEARNING ALGORITHMS
    Yilmaz, Atinc
    HEALTH PROBLEMS OF CIVILIZATION, 2022, 16 (02) : 128 - 139
  • [38] CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
    Ali Haisam Muhammad Rafid
    Md. Toufikuzzaman
    Mohammad Saifur Rahman
    M. Sohel Rahman
    BMC Bioinformatics, 21
  • [39] CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
    Muhammad Rafid, Ali Haisam
    Toufikuzzaman, Md.
    Rahman, Mohammad Saifur
    Rahman, M. Sohel
    BMC BIOINFORMATICS, 2020, 21 (01)
  • [40] A novel feature selection method based on global sensitivity analysis with application in machine learning-based prediction model
    Zhang, Pin
    APPLIED SOFT COMPUTING, 2019, 85