OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization

被引:88
作者
Ahdritz, Gustaf [1 ,2 ]
Bouatta, Nazim [3 ]
Floristean, Christina [1 ]
Kadyan, Sachin [1 ]
Xia, Qinghui [1 ]
Gerecke, William [3 ]
O'Donnell, Timothy J. [4 ]
Berenberg, Daniel [5 ]
Fisk, Ian [6 ]
Zanichelli, Niccolo [7 ]
Zhang, Bo [8 ]
Nowaczynski, Arkadiusz [9 ]
Wang, Bei [9 ]
Stepniewska-Dziubinska, Marta M. [9 ]
Zhang, Shang [9 ]
Ojewole, Adegoke [9 ]
Guney, Murat Efe [9 ]
Biderman, Stella [10 ,11 ]
Watkins, Andrew M. [12 ]
Ra, Stephen [12 ]
Lorenzo, Pablo Ribalta [9 ]
Nivon, Lucas [13 ]
Weitzner, Brian [14 ]
Ban, Yih-En Andrew [15 ]
Chen, Shiyang [16 ]
Zhang, Minjia [17 ]
Li, Conglong [18 ]
Song, Shuaiwen Leon [18 ]
He, Yuxiong [18 ]
Sorger, Peter K. [3 ]
Mostaque, Emad [19 ]
Zhang, Zhao [16 ]
Bonneau, Richard [12 ]
AlQuraishi, Mohammed [1 ]
机构
[1] Columbia Univ, Dept Syst Biol, New York, NY 10032 USA
[2] Harvard Univ, Cambridge, MA USA
[3] Harvard Med Sch, Lab Syst Pharmacol, Boston, MA 02115 USA
[4] Icahn Sch Med Mt Sinai, New York, NY USA
[5] NYU, Courant Inst Math Sci, Dept Comp Sci, New York, NY USA
[6] Flatiron Inst, New York, NY USA
[7] OpenBioML, Cambridge, MA USA
[8] Univ Utah, Sci Comp & Imaging Inst, Salt Lake City, UT USA
[9] NVIDIA, Santa Clara, CA USA
[10] EleutherAI, New York, NY USA
[11] Booz Allen Hamilton, Mclean, VA USA
[12] Prescient Design, Genentech, New York, NY USA
[13] Cyrus Bio, Seattle, WA USA
[14] Outpace Bio, Seattle, WA USA
[15] Arzeda, Seattle, WA USA
[16] Rutgers State Univ, New Brunswick, NJ USA
[17] Univ Illinois Champaign Urbana, Champaign, IL USA
[18] Microsoft, Redmond, WA USA
[19] Stability AI, Los Altos, CA USA
关键词
PROTEIN-STRUCTURE PREDICTION; DOMAIN;
D O I
10.1038/s41592-024-02272-z
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
AlphaFold2 revolutionized structural biology with the ability to predict protein structures with exceptionally high accuracy. Its implementation, however, lacks the code and data required to train new models. These are necessary to (1) tackle new tasks, like protein-ligand complex structure prediction, (2) investigate the process by which the model learns and (3) assess the model's capacity to generalize to unseen regions of fold space. Here we report OpenFold, a fast, memory efficient and trainable implementation of AlphaFold2. We train OpenFold from scratch, matching the accuracy of AlphaFold2. Having established parity, we find that OpenFold is remarkably robust at generalizing even when the size and diversity of its training set is deliberately limited, including near-complete elisions of classes of secondary structure elements. By analyzing intermediate structures produced during training, we also gain insights into the hierarchical manner in which OpenFold learns to fold. In sum, our studies demonstrate the power and utility of OpenFold, which we believe will prove to be a crucial resource for the protein modeling community. OpenFold is a trainable open-source implementation of AlphaFold2. It is fast and memory efficient, and the code and training data are available under a permissive license.
引用
收藏
页码:1514 / 1524
页数:26
相关论文
共 67 条
[1]  
Ahdritz G, 2023, ADV NEUR IN
[2]   Unified rational protein engineering with sequence-based deep representation learning [J].
Alley, Ethan C. ;
Khimulya, Grigory ;
Biswas, Surojit ;
AlQuraishi, Mohammed ;
Church, George M. .
NATURE METHODS, 2019, 16 (12) :1315-+
[3]   The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures [J].
Andreeva, Antonina ;
Kulesha, Eugene ;
Gough, Julian ;
Murzin, Alexey G. .
NUCLEIC ACIDS RESEARCH, 2020, 48 (D1) :D376-D382
[4]   PRINCIPLES THAT GOVERN FOLDING OF PROTEIN CHAINS [J].
ANFINSEN, CB .
SCIENCE, 1973, 181 (4096) :223-230
[5]  
Ba J, 2014, ACS SYM SER
[6]  
Baek M., 2021, Twitter
[7]   Accurate prediction of protein-nucleic acid complexes using RoseTTAFoldNA [J].
Baek, Minkyung ;
Mchugh, Ryan ;
Anishchenko, Ivan ;
Jiang, Hanlun ;
Baker, David ;
DiMaio, Frank .
NATURE METHODS, 2024, 21 (01) :117-121
[8]   Highly significant improvement of protein sequence alignments with AlphaFold2 [J].
Baltzis, Athanasios ;
Mansouri, Leila ;
Jin, Suzanne ;
Langer, Bjorn E. ;
Erb, Ionas ;
Notredame, Cedric ;
Martelli, Pier Luigi .
BIOINFORMATICS, 2022, 38 (22) :5007-5011
[9]  
Bradbury J., 2018, JAX COMPOSABLE TRANS
[10]   Improved prediction of protein-protein interactions using AlphaFold2 [J].
Bryant, P. ;
Pozzati, G. ;
Elofsson, A. .
NATURE COMMUNICATIONS, 2022, 13 (01)