Sequence-based prediction model of protein crystallization propensity using machine learning and two-level feature selection

被引:22
作者
Le, Nguyen Quoc Khanh [1 ]
Li, Wanru [2 ]
Cao, Yanshuang [2 ]
机构
[1] Taipei Med Univ, Coll Med, Profess Master Program Artificial Intelligence Med, Taipei 110, Taiwan
[2] Natl Univ Singapore, Inst Syst Sci, Singapore, Singapore
关键词
crystallization; feature selection; machine learning; protein sequence; prediction model; support vector machine; NETWORK;
D O I
10.1093/bib/bbad319
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein crystallization is crucial for biology, but the steps involved are complex and demanding in terms of external factors and internal structure. To save on experimental costs and time, the tendency of proteins to crystallize can be initially determined and screened by modeling. As a result, this study created a new pipeline aimed at using protein sequence to predict protein crystallization propensity in the protein material production stage, purification stage and production of crystal stage. The newly created pipeline proposed a new feature selection method, which involves combining Chi-square (chi(2)) and recursive feature elimination together with the 12 selected features, followed by a linear discriminant analysisfor dimensionality reduction and finally, a support vector machine algorithm with hyperparameter tuning and 10-fold cross-validation is used to train the model and test the results. This new pipeline has been tested on three different datasets, and the accuracy rates are higher than the existing pipelines. In conclusion, our model provides a new solution to predict multistage protein crystallization propensity which is a big challenge in computational biology.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Prediction of protein subcellular localization using machine learning with novel use of generic feature set
    Upama, Paramita Basak
    Tanny, Nawshin Tabassum
    Akhter, Shahin
    PROCEEDINGS OF 2020 6TH IEEE INTERNATIONAL WOMEN IN ENGINEERING (WIE) CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (WIECON-ECE 2020), 2020, : 98 - 101
  • [42] Sequence-Based Prediction with Feature Representation Learning and Biological Function Analysis of Channel Proteins
    Chen, Zheng
    Jiao, Shihu
    Zhao, Da
    Hesham, Abd El-Latif
    Zou, Quan
    Xu, Lei
    Sun, Mingai
    Zhang, Lijun
    FRONTIERS IN BIOSCIENCE-LANDMARK, 2022, 27 (06):
  • [43] Prediction of Cyclin Protein Using Two-Step Feature Selection Technique
    Sun, Jia-Nan
    Yang, Hua-Yi
    Yao, Jing
    Ding, Hui
    Han, Shu-Guang
    Wu, Cheng-Yan
    Tang, Hua
    IEEE ACCESS, 2020, 8 : 109535 - 109542
  • [44] Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods
    Eskandar Taghizadeh
    Sahel Heydarheydari
    Alihossein Saberi
    Shabnam JafarpoorNesheli
    Seyed Masoud Rezaeijo
    BMC Bioinformatics, 23
  • [45] Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods
    Taghizadeh, Eskandar
    Heydarheydari, Sahel
    Saberi, Alihossein
    JafarpoorNesheli, Shabnam
    Rezaeijo, Seyed Masoud
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [46] Feature Selection based on Mutual Information for Machine learning prediction of Petroleum reservoir properties
    Sulaiman, Muhammad Aliyu
    Labadin, Jane
    2015 9TH INTERNATIONAL CONFERENCE ON IT IN ASIA (CITA), 2015,
  • [47] Predicting Trains Delays using a Two-level Machine Learning Approach
    Laifa, Hassiba
    Khcherif, Raoudha
    Ben Ghezala, Henda
    ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 3, 2022, : 737 - 744
  • [48] Two-Level Intrusion Detection System in SDN Using Machine Learning
    Vetriselvi, V.
    Shruti, P. S.
    Abraham, Susan
    ICCCE 2018, 2019, 500 : 449 - 461
  • [49] Prediction of core cancer genes using a hybrid of feature selection and machine learning methods
    Liu, Y. X.
    Zhang, N. N.
    He, Y.
    Lun, L. J.
    GENETICS AND MOLECULAR RESEARCH, 2015, 14 (03): : 8871 - 8882
  • [50] Feature Selection Based Machine Learning to Improve Prediction of Parkinson Disease
    Nahar, Nazmun
    Ara, Ferdous
    Neloy, Md Arif Istiek
    Biswas, Anik
    Hossain, Mohammad Shahadat
    Andersson, Karl
    BRAIN INFORMATICS, BI 2021, 2021, 12960 : 496 - 508