PreAlgPro: Prediction of allergenic proteins with pre-trained protein language model and efficient neutral network

被引:1
|
作者
Zhang, Lingrong [1 ]
Liu, Taigang [1 ]
机构
[1] Shanghai Ocean Univ, Coll Informat Technol, Shanghai 201306, Peoples R China
关键词
Pre-trained protein language model; Allergenic proteins; Deep learning; Model interpretability; DATABASE;
D O I
10.1016/j.ijbiomac.2024.135762
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Allergy is a prevalent phenomenon, involving allergens such as nuts and milk. Avoiding exposure to allergens is the most effective preventive measure against allergic reactions. However, current homology-based methods for identifying allergenic proteins encounter challenges when dealing with non-homologous data. Traditional machine learning approaches rely on manually extracted features, which lack important protein functional characteristics, including evolutionary information. Consequently, there is still considerable room for improvement in existing methods. In this study, we present PreAlgPro, a method for identifying allergenic proteins based on pre-trained protein language models and deep learning techniques. Specifically, we employed the ProtT5 model to extract protein embedding features, replacing the manual feature extraction step. Furthermore, we devised an Attention-CNN neural network architecture to identify potential features that contribute to the classification of allergenic proteins. The performance of our model was evaluated on four independent test sets, and the experimental results demonstrate that PreAlgPro surpasses existing state-of-the-art methods. Additionally, we collected allergenic protein samples to validate the robustness of the model and conducted an analysis of model interpretability.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Event-Based Clinical Finding Extraction from Radiology Reports with Pre-trained Language Model
    Lau, Wilson
    Lybarger, Kevin
    Gunn, Martin L.
    Yetisgen, Meliha
    JOURNAL OF DIGITAL IMAGING, 2023, 36 (01) : 91 - 104
  • [42] Event-Based Clinical Finding Extraction from Radiology Reports with Pre-trained Language Model
    Wilson Lau
    Kevin Lybarger
    Martin L. Gunn
    Meliha Yetisgen
    Journal of Digital Imaging, 2023, 36 : 91 - 104
  • [43] Characterizing Secretion System Effector Proteins With Structure-Aware Graph Neural Networks and Pre-Trained Language Models
    Ran, Zixu
    Wang, Cong
    Sun, Heyun
    Pan, Shirui
    Li, Fuyi
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (09) : 5649 - 5657
  • [44] Diagnosis of Tomato Plant Diseases Using Pre-trained Architectures and A Proposed Convolutional Neural Network Model
    Koc, Dilara Gerdan
    Koc, Caner
    Vatandas, Mustafa
    JOURNAL OF AGRICULTURAL SCIENCES-TARIM BILIMLERI DERGISI, 2023, 29 (02): : 627 - 638
  • [45] Text data augmentation and pre-trained Language Model for enhancing text classification of low-resource languages
    Ziyaden, Atabay
    Yelenov, Amir
    Hajiyev, Fuad
    Rustamov, Samir
    Pak, Alexandr
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [46] pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model
    Pratyush, Pawel
    Pokharel, Suresh
    Saigo, Hiroto
    Kc, Dukka B. B.
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [47] pLMSNOSite: an ensemble-based approach for predicting protein S-nitrosylation sites by integrating supervised word embedding and embedding from pre-trained protein language model
    Pawel Pratyush
    Suresh Pokharel
    Hiroto Saigo
    Dukka B. KC
    BMC Bioinformatics, 24
  • [48] Absorption Distribution Metabolism Excretion and Toxicity Property Prediction Utilizing a Pre-Trained Natural Language Processing Model and Its Applications in Early-Stage Drug Development
    Jung, Woojin
    Goo, Sungwoo
    Hwang, Taewook
    Lee, Hyunjung
    Kim, Young-Kuk
    Chae, Jung-woo
    Yun, Hwi-yeol
    Jung, Sangkeun
    PHARMACEUTICALS, 2024, 17 (03)
  • [49] FedITD: A Federated Parameter-Efficient Tuning With Pre-Trained Large Language Models and Transfer Learning Framework for Insider Threat Detection
    Wang, Zhi Qiang
    Wang, Haopeng
    El Saddik, Abdulmotaleb
    IEEE ACCESS, 2024, 12 : 160396 - 160417
  • [50] pLM4Alg: Protein Language Model-Based Predictors for Allergenic Proteins and Peptides
    Du, Zhenjiao
    Xu, Yixiang
    Liu, Changqi
    Li, Yonghui
    JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 2023, 72 (01) : 752 - 760