Recent advances in features generation for membrane protein sequences: From multiple sequence alignment to pre-trained language models

被引:2
作者
Ou, Yu-Yen [1 ,2 ,3 ]
Ho, Quang-Thai [1 ]
Chang, Heng-Ta [1 ]
机构
[1] Yuan Ze Univ, Dept Comp Sci & Engn, Chungli, Taiwan
[2] Yuan Ze Univ, Grad Sch Biotechnol & Bioengn, Chungli, Taiwan
[3] Yuan Ze Univ, Dept Comp Sci & Engn, Chungli 32003, Taiwan
关键词
machine learning; membrane proteins; pre-trained language model; SCORING MATRICES; RBF NETWORKS; PREDICTION; TOPOLOGY; LIFE;
D O I
10.1002/pmic.202200494
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Membrane proteins play a crucial role in various cellular processes and are essential components of cell membranes. Computational methods have emerged as a powerful tool for studying membrane proteins due to their complex structures and properties that make them difficult to analyze experimentally. Traditional features for protein sequence analysis based on amino acid types, composition, and pair composition have limitations in capturing higher-order sequence patterns. Recently, multiple sequence alignment (MSA) and pre-trained language models (PLMs) have been used to generate features from protein sequences. However, the significant computational resources required for MSA-based features generation can be a major bottleneck for many applications. Several methods and tools have been developed to accelerate the generation of MSAs and reduce their computational cost, including heuristics and approximate algorithms. Additionally, the use of PLMs such as BERT has shown great potential in generating informative embeddings for protein sequence analysis. In this review, we provide an overview of traditional and more recent methods for generating features from protein sequences, with a particular focus on MSAs and PLMs. We highlight the advantages and limitations of these approaches and discuss the methods and tools developed to address the computational challenges associated with features generation. Overall, the advancements in computational methods and tools provide a promising avenue for gaining deeper insights into the function and properties of membrane proteins, which can have significant implications in drug discovery and personalized medicine.
引用
收藏
页数:10
相关论文
共 53 条
  • [1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [2] Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkh131, 10.1093/nar/gkw1099]
  • [3] Bojanowski P., 2017, Trans. ACL, V5, P135, DOI [DOI 10.1162/TACLA00051, 10.1162/tacla00051, 10.1162/tacl_a_00051, DOI 10.1162/TACL_A_00051]
  • [4] ProteinBERT: a universal deep-learning model of protein sequence and function
    Brandes, Nadav
    Ofer, Dan
    Peleg, Yam
    Rappoport, Nadav
    Linial, Michal
    [J]. BIOINFORMATICS, 2022, 38 (08) : 2102 - 2110
  • [5] Brown TB, 2020, ADV NEUR IN, V33
  • [6] Sensitive protein alignments at tree-of-life scale using DIAMOND
    Buchfink, Benjamin
    Reuter, Klaus
    Drost, Hajk-Georg
    [J]. NATURE METHODS, 2021, 18 (04) : 366 - +
  • [7] Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties
    Chen, Shu-An
    Ou, Yu-Yen
    Lee, Tzong-Yi
    Gromiha, M. Michael
    [J]. BIOINFORMATICS, 2011, 27 (15) : 2062 - 2067
  • [8] MemType-2L: A Web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM
    Chou, Kuo-Chen
    Shen, Hong-Bin
    [J]. BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2007, 360 (02) : 339 - 345
  • [9] Devlin J., 2018, ARXIV
  • [10] Elnaggar A., 2021, IEEE T PATTERN ANAL, V14, P1, DOI [DOI 10.1109/TPAMI.2021.3095381, 10.1109/TPAMI.2021.3095381]