PreAlgPro: Prediction of allergenic proteins with pre-trained protein language model and efficient neutral network

被引:1
|
作者
Zhang, Lingrong [1 ]
Liu, Taigang [1 ]
机构
[1] Shanghai Ocean Univ, Coll Informat Technol, Shanghai 201306, Peoples R China
关键词
Pre-trained protein language model; Allergenic proteins; Deep learning; Model interpretability; DATABASE;
D O I
10.1016/j.ijbiomac.2024.135762
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Allergy is a prevalent phenomenon, involving allergens such as nuts and milk. Avoiding exposure to allergens is the most effective preventive measure against allergic reactions. However, current homology-based methods for identifying allergenic proteins encounter challenges when dealing with non-homologous data. Traditional machine learning approaches rely on manually extracted features, which lack important protein functional characteristics, including evolutionary information. Consequently, there is still considerable room for improvement in existing methods. In this study, we present PreAlgPro, a method for identifying allergenic proteins based on pre-trained protein language models and deep learning techniques. Specifically, we employed the ProtT5 model to extract protein embedding features, replacing the manual feature extraction step. Furthermore, we devised an Attention-CNN neural network architecture to identify potential features that contribute to the classification of allergenic proteins. The performance of our model was evaluated on four independent test sets, and the experimental results demonstrate that PreAlgPro surpasses existing state-of-the-art methods. Additionally, we collected allergenic protein samples to validate the robustness of the model and conducted an analysis of model interpretability.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] LPBERT: A Protein-Protein Interaction Prediction Method Based on a Pre-Trained Language Model
    Hu, An
    Kuang, Linai
    Yang, Dinghai
    APPLIED SCIENCES-BASEL, 2025, 15 (06):
  • [2] PepNet: an interpretable neural network for anti-inflammatory and antimicrobial peptides prediction using a pre-trained protein language model
    Han, Jiyun
    Kong, Tongxin
    Liu, Juntao
    COMMUNICATIONS BIOLOGY, 2024, 7 (01)
  • [3] PDNAPred: Interpretable prediction of protein-DNA binding sites based on pre-trained protein language models
    Zhang, Lingrong
    Liu, Taigang
    INTERNATIONAL JOURNAL OF BIOLOGICAL MACROMOLECULES, 2024, 281
  • [4] Porter 6: Protein Secondary Structure Prediction by Leveraging Pre-Trained Language Models (PLMs)
    Alanazi, Wafa
    Meng, Di
    Pollastri, Gianluca
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2025, 26 (01)
  • [5] Predictive Recognition of DNA-binding Proteins Based on Pre-trained Language Model BERT
    Ma, Yue
    Pei, Yongzhen
    Li, Changguo
    JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2023, 21 (06)
  • [6] A survey of text classification based on pre-trained language model
    Wu, Yujia
    Wan, Jun
    NEUROCOMPUTING, 2025, 616
  • [7] Idiom Cloze Algorithm Integrating with Pre-trained Language Model
    Ju S.-G.
    Huang F.-Y.
    Sun J.-P.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (10): : 3793 - 3805
  • [8] Efficient Federated Learning with Pre-Trained Large Language Model Using Several Adapter Mechanisms
    Kim, Gyunyeop
    Yoo, Joon
    Kang, Sangwoo
    MATHEMATICS, 2023, 11 (21)
  • [9] Modelling a dense network for soft tissue prediction using pre-trained network
    Koppireddy, Chandra Sekhar
    Rao, G. Siva Nageswara
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024,
  • [10] PaleAle 6.0: Prediction of Protein Relative Solvent Accessibility by Leveraging Pre-Trained Language Models (PLMs)
    Alanazi, Wafa
    Meng, Di
    Pollastri, Gianluca
    BIOMOLECULES, 2025, 15 (01)