Recent advances in features generation for membrane protein sequences: From multiple sequence alignment to pre-trained language models

被引:2
作者
Ou, Yu-Yen [1 ,2 ,3 ]
Ho, Quang-Thai [1 ]
Chang, Heng-Ta [1 ]
机构
[1] Yuan Ze Univ, Dept Comp Sci & Engn, Chungli, Taiwan
[2] Yuan Ze Univ, Grad Sch Biotechnol & Bioengn, Chungli, Taiwan
[3] Yuan Ze Univ, Dept Comp Sci & Engn, Chungli 32003, Taiwan
关键词
machine learning; membrane proteins; pre-trained language model; SCORING MATRICES; RBF NETWORKS; PREDICTION; TOPOLOGY; LIFE;
D O I
10.1002/pmic.202200494
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Membrane proteins play a crucial role in various cellular processes and are essential components of cell membranes. Computational methods have emerged as a powerful tool for studying membrane proteins due to their complex structures and properties that make them difficult to analyze experimentally. Traditional features for protein sequence analysis based on amino acid types, composition, and pair composition have limitations in capturing higher-order sequence patterns. Recently, multiple sequence alignment (MSA) and pre-trained language models (PLMs) have been used to generate features from protein sequences. However, the significant computational resources required for MSA-based features generation can be a major bottleneck for many applications. Several methods and tools have been developed to accelerate the generation of MSAs and reduce their computational cost, including heuristics and approximate algorithms. Additionally, the use of PLMs such as BERT has shown great potential in generating informative embeddings for protein sequence analysis. In this review, we provide an overview of traditional and more recent methods for generating features from protein sequences, with a particular focus on MSAs and PLMs. We highlight the advantages and limitations of these approaches and discuss the methods and tools developed to address the computational challenges associated with features generation. Overall, the advancements in computational methods and tools provide a promising avenue for gaining deeper insights into the function and properties of membrane proteins, which can have significant implications in drug discovery and personalized medicine.
引用
收藏
页数:10
相关论文
共 53 条
  • [11] ProtGPT2 is a deep unsupervised language model for protein design
    Ferruz, Noelia
    Schmidt, Steffen
    Hocker, Birte
    [J]. NATURE COMMUNICATIONS, 2022, 13 (01)
  • [12] Identification of adaptor proteins by incorporating deep learning and PSSM profiles
    Gao, Wentao
    Xu, Dali
    Li, Hongfei
    Du, Junping
    Wang, Guohua
    Li, Dan
    [J]. METHODS, 2023, 209 : 10 - 17
  • [13] Bioinformatics approaches for functional annotation of membrane proteins
    Gromiha, M. Michael
    Ou, Yu-Yen
    [J]. BRIEFINGS IN BIOINFORMATICS, 2014, 15 (02) : 155 - 168
  • [14] Functional discrimination of membrane proteins using machine learning techniques
    Gromiha, M. Michael
    Yabuki, Yukimitsu
    [J]. BMC BIOINFORMATICS, 2008, 9 (1)
  • [15] A simple statistical method for discriminating outer membrane proteins with better accuracy
    Gromiha, MM
    Suwa, M
    [J]. BIOINFORMATICS, 2005, 21 (07) : 961 - 968
  • [16] Accurate classification of membrane protein types based on sequence and evolutionary information using deep learning
    Guo, Lei
    Wang, Shunfang
    Li, Mingyuan
    Cao, Zicheng
    [J]. BMC BIOINFORMATICS, 2019, 20 (01)
  • [17] mCNN-ETC: identifying electron transporters and their functional families by using multiple windows scanning techniques in convolutional neural networks with evolutionary information of protein sequences
    Ho, Quang-Thai
    Le, Nguyen Quoc Khanh
    Ou, Yu-Yen
    [J]. BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [18] Protein secondary structure prediction based on position-specific scoring matrices
    Jones, DT
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1999, 292 (02) : 195 - 202
  • [19] Highly accurate protein structure prediction with AlphaFold
    Jumper, John
    Evans, Richard
    Pritzel, Alexander
    Green, Tim
    Figurnov, Michael
    Ronneberger, Olaf
    Tunyasuvunakool, Kathryn
    Bates, Russ
    Zidek, Augustin
    Potapenko, Anna
    Bridgland, Alex
    Meyer, Clemens
    Kohl, Simon A. A.
    Ballard, Andrew J.
    Cowie, Andrew
    Romera-Paredes, Bernardino
    Nikolov, Stanislav
    Jain, Rishub
    Adler, Jonas
    Back, Trevor
    Petersen, Stig
    Reiman, David
    Clancy, Ellen
    Zielinski, Michal
    Steinegger, Martin
    Pacholska, Michalina
    Berghammer, Tamas
    Bodenstein, Sebastian
    Silver, David
    Vinyals, Oriol
    Senior, Andrew W.
    Kavukcuoglu, Koray
    Kohli, Pushmeet
    Hassabis, Demis
    [J]. NATURE, 2021, 596 (7873) : 583 - +
  • [20] Recent developments in deep learning applied to protein structure prediction
    Kandathil, Shaun M.
    Greener, Joe G.
    Jones, David T.
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2019, 87 (12) : 1179 - 1189