Machine Learning Based Prediction of Enzymatic Degradation of Plastics Using Encoded Protein Sequence and Effective Feature Representation

被引:14
|
作者
Jiang, Renjing [1 ]
Shang, Lanyu [2 ]
Wang, Ruohan [1 ]
Wang, Dong [2 ]
Wei, Na [1 ]
机构
[1] Univ Illinois, Dept Civil & Environm Engn, Urbana, IL 61801 USA
[2] Univ Illinois, Sch Informat Sci, Champaign, IL 61820 USA
基金
美国国家科学基金会;
关键词
Machine learning; plastic waste; enzymaticdegradation; enzyme function; sequence representation; HEAT-CAPACITY; PORE-SIZE; TECHNOLOGIES; DEPOLYMERASE; HYDROLYSIS; DIFFUSION; SUBSTRATE;
D O I
10.1021/acs.estlett.3c00293
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Enzyme biocatalysis for plastic treatment and recyclingis an emergingfield of growing interest. However, it is challenging and time-consumingto identify plastic-degrading enzymes with desirable functionality,given the large number of putative enzyme sequences. There is a criticalneed to develop an effective approach to accurately predict the enzymeactivity in degrading different types of plastics. In this study,we developed a machine-learning-based plastic enzymatic degradation(PED) framework to predict the ability of an enzyme to degrade plasticsof interest by exploring and recognizing hidden patterns in proteinsequences. A data set integrating information from a wide range ofexperimentally verified enzymes and various common plastic substrateswas created. A new context-aware enzyme sequence representation (CESR)mechanism was developed to learn the abundant contextual informationin enzyme sequences, and feature extraction was performed for enzymesat both the amino acid level and global sequence level. Thirteen machinelearning classification algorithms were compared, and XGBoost wasidentified as the best-performing algorithm. PED achieved an overallaccuracy of 90.2% and outperformed sequence-based protein classificationmodels from the existing literature. Furthermore, important enzymefeatures in plastic degradation were identified and comprehensivelyinterpreted. This study demonstrated a new tool for the predictionand discovery of plastic-degrading enzymes.
引用
收藏
页码:557 / 564
页数:8
相关论文
共 50 条
  • [1] Protein Secondary Structural Class Prediction using Effective Feature Modeling and Machine Learning Techniques
    Bankapur, Sanjay
    Patil, Nagamma
    PROCEEDINGS 2018 IEEE 18TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2018, : 18 - 21
  • [2] Accurate prediction of Snare Protein Sequence using Machine Learning
    Talpur, Dani Bux
    Shaikh, Salahuddin
    Khowaja, Ashfaque
    Adnan, Saifullah
    Ghulam, Ali
    BIOSCIENCE RESEARCH, 2022, 19 (03): : 1414 - 1422
  • [3] Sequence-based prediction model of protein crystallization propensity using machine learning and two-level feature selection
    Le, Nguyen Quoc Khanh
    Li, Wanru
    Cao, Yanshuang
    BRIEFINGS IN BIOINFORMATICS, 2023, 24 (05)
  • [4] Boosting phosphorylation site prediction with sequence feature-based machine learning
    Maiti, Shyantani
    Hassan, Atif
    Mitra, Pralay
    PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2020, 88 (02) : 284 - 291
  • [5] Antiprotozoal peptide prediction using machine learning with effective feature selection techniques
    Periwal, Neha
    Arora, Pooja
    Thakur, Ananya
    Agrawal, Lakshay
    Goyal, Yash
    Rathore, Anand S.
    Anand, Harsimrat Singh
    Kaur, Baljeet
    Sood, Vikas
    HELIYON, 2024, 10 (16)
  • [6] Prediction of peptide binding to MHC using machine learning with sequence and structure-based feature sets
    Aranha, Michelle P.
    Spooner, Catherine
    Demerdash, Omar
    Czejdo, Bogdan
    Smith, Jeremy C.
    Mitchell, Julie C.
    BIOCHIMICA ET BIOPHYSICA ACTA-GENERAL SUBJECTS, 2020, 1864 (04):
  • [7] Sequence Alignment Using Machine Learning for Accurate Template-based Protein Structure Prediction
    Makigaki, Shuichiro
    Ishida, Takashi
    BIO-PROTOCOL, 2020, 10 (09):
  • [8] Sequence alignment using machine learning for accurate template-based protein structure prediction
    Makigaki, Shuichiro
    Ishida, Takashi
    BIOINFORMATICS, 2020, 36 (01) : 104 - 111
  • [9] Ten quick tips for sequence-based prediction of protein properties using machine learning
    Hou, Qingzhen
    Waury, Katharina
    Gogishvili, Dea
    Feenstra, K. Anton
    PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (12)
  • [10] Swarm-based support vector machine optimization for protein sequence-encoded prediction
    Balaji, Prasanalakshmi
    Srinivasan, K.
    Mahaveerakannan, R.
    Maurya, Sudhanshu
    Kumar, T. Rajesh
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024,