The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction

被引:1
作者
Zhang, Chenyue [1 ]
Wang, Qinxin [2 ]
Li, Yiyang [1 ]
Teng, Anqi [3 ]
Hu, Gang [1 ]
Wuyun, Qiqige [4 ]
Zheng, Wei [1 ,5 ]
机构
[1] Nankai Univ, Sch Stat & Data Sci, NITFID, LPMC & KLMDASR, Tianjin 300071, Peoples R China
[2] Suzhou New & High Tech Innovat Serv Ctr, Suzhou 215011, Peoples R China
[3] Hong Kong Univ Sci & Technol Guangzhou, Biosci & Biomed Engn Thrust, Syst Hub, Guangzhou 511453, Peoples R China
[4] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
[5] Univ Michigan, Dept Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
pairwise sequence alignment; multiple sequence alignment; protein monomer; protein complex; RNA; protein language model; function prediction; protein structure prediction; deep learning; PROTEIN HOMOLOGY DETECTION; META-THREADING-SERVER; SUBSTITUTION MATRICES; BINDING RESIDUES; NUCLEIC-ACID; WEB SERVER; SEARCH; IDENTIFICATION; RECOGNITION; CONTACTS;
D O I
10.3390/biom14121531
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multiple sequence alignment (MSA) has evolved into a fundamental tool in the biological sciences, playing a pivotal role in predicting molecular structures and functions. With broad applications in protein and nucleic acid modeling, MSAs continue to underpin advancements across a range of disciplines. MSAs are not only foundational for traditional sequence comparison techniques but also increasingly important in the context of artificial intelligence (AI)-driven advancements. Recent breakthroughs in AI, particularly in protein and nucleic acid structure prediction, rely heavily on the accuracy and efficiency of MSAs to enhance remote homology detection and guide spatial restraints. This review traces the historical evolution of MSA, highlighting its significance in molecular structure and function prediction. We cover the methodologies used for protein monomers, protein complexes, and RNA, while also exploring emerging AI-based alternatives, such as protein language models, as complementary or replacement approaches to traditional MSAs in application tasks. By discussing the strengths, limitations, and applications of these methods, this review aims to provide researchers with valuable insights into MSA's evolving role, equipping them to make informed decisions in structural prediction research.
引用
收藏
页数:37
相关论文
共 166 条
  • [91] A comparison of profile hidden Markov model procedures for remote homology detection
    Madera, M
    Gough, J
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (19) : 4321 - 4328
  • [92] IMG/M: a data management and analysis system for metagenomes
    Markowitz, Victor M.
    Ivanova, Natalia N.
    Szeto, Ernest
    Palaniappan, Krishna
    Chu, Ken
    Dalevi, Daniel
    Chen, I-Min A.
    Grechkin, Yuri
    Dubchak, Inna
    Anderson, Iain
    Lykidis, Athanasios
    Mavromatis, Konstantinos
    Hugenholtz, Philip
    Kyrpides, Nikos C.
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D534 - D538
  • [93] Dynalign: An algorithm for finding the secondary structure common to two RNA sequences
    Mathews, DH
    Turner, DH
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 2002, 317 (02) : 191 - 203
  • [94] McWhite Claire D, 2023, Genome Res, V33, P1145, DOI 10.1101/gr.277675.123
  • [95] Meier J., 2021, BIORXIV
  • [96] A language model beats alphafold2 on orphans
    Michaud, Jennifer M.
    Madani, Ali
    Fraser, James S.
    [J]. NATURE BIOTECHNOLOGY, 2022, 40 (11) : 1576 - 1577
  • [97] Uniclust databases of clustered and deeply annotated protein sequences and alignments
    Mirdita, Milot
    von den Driesch, Lars
    Galiez, Clovis
    Martin, Maria J.
    Soeding, Johannes
    Steinegger, Martin
    [J]. NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) : D170 - D176
  • [98] Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions
    Mistry, Jaina
    Finn, Robert D.
    Eddy, Sean R.
    Bateman, Alex
    Punta, Marco
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (12) : e121
  • [99] Direct-coupling analysis of residue coevolution captures native contacts across many protein families
    Morcos, Faruck
    Pagnani, Andrea
    Lunt, Bryan
    Bertolino, Arianna
    Marks, Debora S.
    Sander, Chris
    Zecchina, Riccardo
    Onuchic, Jose N.
    Hwa, Terence
    Weigt, Martin
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (49) : E1293 - E1301
  • [100] Estimating amino acid substitution models:: A comparison of Dayhoff's estimator, the resolvent approach and a maximum likelihood method
    Müller, T
    Spang, R
    Vingron, M
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 2002, 19 (01) : 8 - 13