The Historical Evolution and Significance of Multiple Sequence Alignment in Molecular Structure and Function Prediction

被引:1
作者
Zhang, Chenyue [1 ]
Wang, Qinxin [2 ]
Li, Yiyang [1 ]
Teng, Anqi [3 ]
Hu, Gang [1 ]
Wuyun, Qiqige [4 ]
Zheng, Wei [1 ,5 ]
机构
[1] Nankai Univ, Sch Stat & Data Sci, NITFID, LPMC & KLMDASR, Tianjin 300071, Peoples R China
[2] Suzhou New & High Tech Innovat Serv Ctr, Suzhou 215011, Peoples R China
[3] Hong Kong Univ Sci & Technol Guangzhou, Biosci & Biomed Engn Thrust, Syst Hub, Guangzhou 511453, Peoples R China
[4] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
[5] Univ Michigan, Dept Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
pairwise sequence alignment; multiple sequence alignment; protein monomer; protein complex; RNA; protein language model; function prediction; protein structure prediction; deep learning; PROTEIN HOMOLOGY DETECTION; META-THREADING-SERVER; SUBSTITUTION MATRICES; BINDING RESIDUES; NUCLEIC-ACID; WEB SERVER; SEARCH; IDENTIFICATION; RECOGNITION; CONTACTS;
D O I
10.3390/biom14121531
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Multiple sequence alignment (MSA) has evolved into a fundamental tool in the biological sciences, playing a pivotal role in predicting molecular structures and functions. With broad applications in protein and nucleic acid modeling, MSAs continue to underpin advancements across a range of disciplines. MSAs are not only foundational for traditional sequence comparison techniques but also increasingly important in the context of artificial intelligence (AI)-driven advancements. Recent breakthroughs in AI, particularly in protein and nucleic acid structure prediction, rely heavily on the accuracy and efficiency of MSAs to enhance remote homology detection and guide spatial restraints. This review traces the historical evolution of MSA, highlighting its significance in molecular structure and function prediction. We cover the methodologies used for protein monomers, protein complexes, and RNA, while also exploring emerging AI-based alternatives, such as protein language models, as complementary or replacement approaches to traditional MSAs in application tasks. By discussing the strengths, limitations, and applications of these methods, this review aims to provide researchers with valuable insights into MSA's evolving role, equipping them to make informed decisions in structural prediction research.
引用
收藏
页数:37
相关论文
共 166 条
  • [31] Local homology recognition and distance measures in linear time using compressed amino acid alphabets
    Edgar, RC
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 (01) : 380 - 385
  • [32] Search and clustering orders of magnitude faster than BLAST
    Edgar, Robert C.
    [J]. BIOINFORMATICS, 2010, 26 (19) : 2460 - 2461
  • [33] RNAlien - Unsupervised RNA family model construction
    Eggenhofer, Florian
    Hofacker, Ivo L.
    zu Siederdissen, Christian Hoener
    [J]. NUCLEIC ACIDS RESEARCH, 2016, 44 (17) : 8433 - 8441
  • [34] ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning
    Elnaggar, Ahmed
    Heinzinger, Michael
    Dallago, Christian
    Rehawi, Ghalia
    Wang, Yu
    Jones, Llion
    Gibbs, Tom
    Feher, Tamas
    Angerer, Christoph
    Steinegger, Martin
    Bhowmik, Debsindhu
    Rost, Burkhard
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (10) : 7112 - 7127
  • [35] Evans R, 2022, bioRxiv, DOI [10.1101/2021.10.04.463034, DOI 10.1101/2021.10.04.463034, 10.04.463034]
  • [36] The NCBI Taxonomy database
    Federhen, Scott
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) : D136 - D143
  • [37] PROGRESSIVE SEQUENCE ALIGNMENT AS A PREREQUISITE TO CORRECT PHYLOGENETIC TREES
    FENG, DF
    DOOLITTLE, RF
    [J]. JOURNAL OF MOLECULAR EVOLUTION, 1987, 25 (04) : 351 - 360
  • [38] ProtGPT2 is a deep unsupervised language model for protein design
    Ferruz, Noelia
    Schmidt, Steffen
    Hocker, Birte
    [J]. NATURE COMMUNICATIONS, 2022, 13 (01)
  • [39] Pfam: the protein families database
    Finn, Robert D.
    Bateman, Alex
    Clements, Jody
    Coggill, Penelope
    Eberhardt, Ruth Y.
    Eddy, Sean R.
    Heger, Andreas
    Hetherington, Kirstie
    Holm, Liisa
    Mistry, Jaina
    Sonnhammer, Erik L. L.
    Tate, John
    Punta, Marco
    [J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) : D222 - D230
  • [40] Disease variant prediction with deep generative models of evolutionary data
    Frazer, Jonathan
    Notin, Pascal
    Dias, Mafalda
    Gomez, Aidan
    Min, Joseph K.
    Brock, Kelly
    Gal, Yarin
    Marks, Debora S.
    [J]. NATURE, 2021, 599 (7883) : 91 - +