Recent advances in features generation for membrane protein sequences: From multiple sequence alignment to pre-trained language models

被引：2

作者：

Ou, Yu-Yen ^{[1
,2
,3
]}

Ho, Quang-Thai ^{[1
]}

Chang, Heng-Ta ^{[1
]}

机构：

[1] Yuan Ze Univ, Dept Comp Sci & Engn, Chungli, Taiwan

[2] Yuan Ze Univ, Grad Sch Biotechnol & Bioengn, Chungli, Taiwan

[3] Yuan Ze Univ, Dept Comp Sci & Engn, Chungli 32003, Taiwan

来源：

PROTEOMICS | 2023年 / 23卷 / 23-24期

关键词：

machine learning; membrane proteins; pre-trained language model; SCORING MATRICES; RBF NETWORKS; PREDICTION; TOPOLOGY; LIFE;

D O I：

10.1002/pmic.202200494

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Membrane proteins play a crucial role in various cellular processes and are essential components of cell membranes. Computational methods have emerged as a powerful tool for studying membrane proteins due to their complex structures and properties that make them difficult to analyze experimentally. Traditional features for protein sequence analysis based on amino acid types, composition, and pair composition have limitations in capturing higher-order sequence patterns. Recently, multiple sequence alignment (MSA) and pre-trained language models (PLMs) have been used to generate features from protein sequences. However, the significant computational resources required for MSA-based features generation can be a major bottleneck for many applications. Several methods and tools have been developed to accelerate the generation of MSAs and reduce their computational cost, including heuristics and approximate algorithms. Additionally, the use of PLMs such as BERT has shown great potential in generating informative embeddings for protein sequence analysis. In this review, we provide an overview of traditional and more recent methods for generating features from protein sequences, with a particular focus on MSAs and PLMs. We highlight the advantages and limitations of these approaches and discuss the methods and tools developed to address the computational challenges associated with features generation. Overall, the advancements in computational methods and tools provide a promising avenue for gaining deeper insights into the function and properties of membrane proteins, which can have significant implications in drug discovery and personalized medicine.

引用

页数：10

共 53 条

[1] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Altschul, SF
Madden, TL
Schaffer, AA
Zhang, JH
Zhang, Z
Miller, W
Lipman, DJ
[J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
[2] Apweiler R, 2004, NUCLEIC ACIDS RES, V32, pD115, DOI [10.1093/nar/gkh131, 10.1093/nar/gkw1099]
[3] Bojanowski P., 2017, Trans. ACL, V5, P135, DOI [DOI 10.1162/TACLA00051, 10.1162/tacla00051, 10.1162/tacl_a_00051, DOI 10.1162/TACL_A_00051]
[4] ProteinBERT: a universal deep-learning model of protein sequence and function
Brandes, Nadav
Ofer, Dan
Peleg, Yam
Rappoport, Nadav
Linial, Michal
[J]. BIOINFORMATICS, 2022, 38 (08) : 2102 - 2110
[5] Brown TB, 2020, ADV NEUR IN, V33
[6] Sensitive protein alignments at tree-of-life scale using DIAMOND
Buchfink, Benjamin
Reuter, Klaus
Drost, Hajk-Georg
[J]. NATURE METHODS, 2021, 18 (04) : 366 - +
[7] Prediction of transporter targets using efficient RBF networks with PSSM profiles and biochemical properties
Chen, Shu-An
Ou, Yu-Yen
Lee, Tzong-Yi
Gromiha, M. Michael
[J]. BIOINFORMATICS, 2011, 27 (15) : 2062 - 2067
[8] MemType-2L: A Web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM
Chou, Kuo-Chen
Shen, Hong-Bin
[J]. BIOCHEMICAL AND BIOPHYSICAL RESEARCH COMMUNICATIONS, 2007, 360 (02) : 339 - 345
[9] Devlin J., 2018, ARXIV
[10] Elnaggar A., 2021, IEEE T PATTERN ANAL, V14, P1, DOI [DOI 10.1109/TPAMI.2021.3095381, 10.1109/TPAMI.2021.3095381]

← 1 2 3 4 5 6 →