Evolutionary-scale prediction of atomic-level protein structure with a language model

被引:1178
作者
Lin, Zeming [1 ,2 ]
Akin, Halil [1 ]
Rao, Roshan [1 ]
Hie, Brian [1 ,3 ]
Zhu, Zhongkai [1 ]
Lu, Wenting [1 ]
Smetanin, Nikita [1 ]
Verkuil, Robert [1 ]
Kabeli, Ori [1 ]
Shmueli, Yaniv [1 ]
Costa, Allan dos Santos [4 ]
Fazel-Zarandi, Maryam [1 ]
Sercu, Tom [1 ]
Candido, Salvatore [1 ]
Rives, Alexander [1 ,2 ]
机构
[1] Meta AI, FAIR, New York, NY 10024 USA
[2] NYU, New York, NY 10012 USA
[3] Stanford Univ, Palo Alto, CA USA
[4] MIT, Cambridge, MA USA
关键词
CONTACTS;
D O I
10.1126/science.ade2574
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Recent advances in machine learning have leveraged evolutionary information in multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level protein structure from primary sequence using a large language model. As language models of protein sequences are scaled up to 15 billion parameters, an atomic-resolution picture of protein structure emerges in the learned representations. This results in an order-of-magnitude acceleration of high-resolution structure prediction, which enables large-scale structural characterization of metagenomic proteins. We apply this capability to construct the ESM Metagenomic Atlas by predicting structures for >617 million metagenomic protein sequences, including >225 million that are predicted with high confidence, which gives a view into the vast breadth and diversity of natural proteins.
引用
收藏
页码:1123 / 1130
页数:8
相关论文
共 60 条
[1]  
Ahdritz G, 2022, BIORXIV, DOI [10.1101/2022.11.20.517210, DOI 10.1101/2022.11.20.517210, 10.1101/2022.11.20.517210v2]
[2]  
Alec Radford Karthik, 2018, Improving language understanding by generative pre-training
[3]   Unified rational protein engineering with sequence-based deep representation learning [J].
Alley, Ethan C. ;
Khimulya, Grigory ;
Biswas, Surojit ;
AlQuraishi, Mohammed ;
Church, George M. .
NATURE METHODS, 2019, 16 (12) :1315-+
[4]   COORDINATED AMINO-ACID CHANGES IN HOMOLOGOUS PROTEIN FAMILIES [J].
ALTSCHUH, D ;
VERNET, T ;
BERTI, P ;
MORAS, D ;
NAGAI, K .
PROTEIN ENGINEERING, 1988, 2 (03) :193-199
[5]  
ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
[6]   Accurate prediction of protein structures and interactions using a three-track neural network [J].
Baek, Minkyung ;
DiMaio, Frank ;
Anishchenko, Ivan ;
Dauparas, Justas ;
Ovchinnikov, Sergey ;
Lee, Gyu Rie ;
Wang, Jue ;
Cong, Qian ;
Kinch, Lisa N. ;
Schaeffer, R. Dustin ;
Millan, Claudia ;
Park, Hahnbeom ;
Adams, Carson ;
Glassman, Caleb R. ;
DeGiovanni, Andy ;
Pereira, Jose H. ;
Rodrigues, Andria V. ;
van Dijk, Alberdina A. ;
Ebrecht, Ana C. ;
Opperman, Diederik J. ;
Sagmeister, Theo ;
Buhlheller, Christoph ;
Pavkov-Keller, Tea ;
Rathinaswamy, Manoj K. ;
Dalwadi, Udit ;
Yip, Calvin K. ;
Burke, John E. ;
Garcia, K. Christopher ;
Grishin, Nick V. ;
Adams, Paul D. ;
Read, Randy J. ;
Baker, David .
SCIENCE, 2021, 373 (6557) :871-+
[7]   DockQ: A Quality Measure for Protein-Protein Docking Models [J].
Basu, Sankar ;
Wallner, Bjorn .
PLOS ONE, 2016, 11 (08)
[8]   Learning the protein language: Evolution, structure, and function [J].
Bepler, Tristan ;
Berger, Bonnie .
CELL SYSTEMS, 2021, 12 (06) :654-+
[9]  
Brown T., 2020, P ADV NEUR INF PROC, V33, P1877
[10]   RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy [J].
Burley, Stephen K. ;
Berman, Helen M. ;
Bhikadiya, Charmi ;
Bi, Chunxiao ;
Chen, Li ;
Di Costanzo, Luigi ;
Christie, Cole ;
Dalenberg, Ken ;
Duarte, Jose M. ;
Dutta, Shuchismita ;
Feng, Zukang ;
Ghosh, Sutapa ;
Goodsell, David S. ;
Green, Rachel K. ;
Guranovic, Vladimir ;
Guzenko, Dmytro ;
Hudson, Brian P. ;
Kalro, Tara ;
Liang, Yuhe ;
Lowe, Robert ;
Namkoong, Harry ;
Peisach, Ezra ;
Periskova, Irina ;
Prlic, Andreas ;
Randle, Chris ;
Rose, Alexander ;
Rose, Peter ;
Sala, Raul ;
Sekharan, Monica ;
Shao, Chenghua ;
Tan, Lihua ;
Tao, Yi-Ping ;
Valasatava, Yana ;
Voigt, Maria ;
Westbrook, John ;
Woo, Jesse ;
Yang, Huanwang ;
Young, Jasmine ;
Zhuravleva, Marina ;
Zardecki, Christine .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D464-D474