Generative models for protein sequence modeling: recent advances and future directions

被引:10
|
作者
Mardikoraem, Mehrsa [1 ]
Wang, Zirui [2 ,3 ]
Pascual, Nathaniel
Woldring, Daniel [1 ,4 ]
机构
[1] Michigan State Univ, Dept Chem Engn & Mat Sci, E Lansing, MI 48824 USA
[2] Regeneron Pharmaceut Inc, Tarrytown, NY USA
[3] Syracuse Univ, Comp Sci, Syracuse, NY USA
[4] MSU, Inst Quantitat Hlth Sci & Engn, E Lansing, MI 48824 USA
关键词
generative machine learning (ML) models; protein engineering; generative adversarial neural networks (GANs); variational autoencoders (VAE); natural language processing (NLP); diffusion models; NEURAL-NETWORK; PREDICTION; LANGUAGE; DESIGN; LIFE; DNA;
D O I
10.1093/bib/bbad358
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The widespread adoption of high-throughput omics technologies has exponentially increased the amount of protein sequence data involved in many salient disease pathways and their respective therapeutics and diagnostics. Despite the availability of large-scale sequence data, the lack of experimental fitness annotations underpins the need for self-supervised and unsupervised machine learning (ML) methods. These techniques leverage the meaningful features encoded in abundant unlabeled sequences to accomplish complex protein engineering tasks. Proficiency in the rapidly evolving fields of protein engineering and generative AI is required to realize the full potential of ML models as a tool for protein fitness landscape navigation. Here, we support this work by (i) providing an overview of the architecture and mathematical details of the most successful ML models applicable to sequence data (e.g. variational autoencoders, autoregressive models, generative adversarial neural networks, and diffusion models), (ii) guiding how to effectively implement these models on protein sequence data to predict fitness or generate high-fitness sequences and (iii) highlighting several successful studies that implement these techniques in protein engineering (from paratope regions and subcellular localization prediction to high-fitness sequences and protein design rules generation). By providing a comprehensive survey of model details, novel architecture developments, comparisons of model applications, and current challenges, this study intends to provide structured guidance and robust framework for delivering a prospective outlook in the ML-driven protein engineering field.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Small animal models of filovirus disease: recent advances and future directions
    Cross, Robert W.
    Fenton, Karla A.
    Geisbert, Thomas W.
    EXPERT OPINION ON DRUG DISCOVERY, 2018, 13 (11) : 1027 - 1040
  • [2] Neurodevelopmental impact of CNV models in ASD: Recent advances and future directions
    Tamada, Kota
    Takumi, Toru
    CURRENT OPINION IN NEUROBIOLOGY, 2025, 92
  • [3] Recent advances and future directions in superplasticity
    Higashi, K
    SUPERPLASTICITY IN ADVANCED MATERIALS, ICSAM-2000, 2001, 357-3 : 345 - 356
  • [4] Laminitis: Recent advances and future directions
    Marr, C. M.
    EQUINE VETERINARY JOURNAL, 2012, 44 (06) : 733 - 734
  • [5] Sonogenetics: Recent advances and future directions
    Liu, Tianyi
    Choi, Mi Hyun
    Zhu, Jiejun
    Zhu, Tingting
    Yang, Jin
    Li, Na
    Chen, Zihao
    Xian, Quanxiang
    Hou, Xuandi
    He, Dongmin
    Guo, Jinghui
    Fei, Chunlong
    Sun, Lei
    Qiu, Zhihai
    BRAIN STIMULATION, 2022, 15 (05) : 1308 - 1317
  • [6] Multi-component body composition models: recent advances and future directions
    Pietrobelli, A
    Heymsfield, SB
    Wang, ZM
    Gallagher, D
    EUROPEAN JOURNAL OF CLINICAL NUTRITION, 2001, 55 (02) : 69 - 75
  • [7] Plane - Parallel canopy radiation transfer modeling: Recent advances and future directions
    Qin, W.
    Liang, S.
    2000, Harwood Academic Publishers GmbH (18): : 2 - 4
  • [8] The Temporal Context in Bayesian Models of Interval Timing: Recent Advances and Future Directions
    Sadibolova, Renata
    Terhune, Devin B.
    BEHAVIORAL NEUROSCIENCE, 2022, 136 (05) : 364 - 373
  • [9] Recent Advances and Future Directions of Diagnostic and Prognostic Prediction Models in Ovarian Cancer
    Zeng J.
    Cao W.
    Wang L.
    Journal of Shanghai Jiaotong University (Science), 2021, 26 (1) : 10 - 16
  • [10] Multi-component body composition models: recent advances and future directions
    A Pietrobelli
    SB Heymsfield
    ZM Wang
    D Gallagher
    European Journal of Clinical Nutrition, 2001, 55 : 69 - 75