Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention

被引：0

作者：

Bhattacharya, Nicholas ^{[1
]}

Thomas, Neil ^{[1
]}

Rao, Roshan ^{[1
]}

Dauparas, Justas ^{[2
]}

Koo, Peter K. ^{[3
]}

Baker, David ^{[2
]}

Song, Yun S. ^{[1
,4
]}

Ovchinnikov, Sergey ^{[5
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

[2] Univ Washington, Seattle, WA 98195 USA

[3] Cold Spring Harbor Lab, POB 100, Cold Spring Harbor, NY 11724 USA

[4] Chan Zuckerberg Biohub, San Francisco, CA USA

[5] Harvard Univ, Cambridge, MA 02138 USA

来源：

BIOCOMPUTING 2022, PSB 2022 | 2022年

关键词：

Contact Prediction; Representation Learning; Language Modeling; Attention; Transformer; BERT; Markov Random Fields; Potts Models; Self-supervised learning; CONTACT PREDICTIONS; COEVOLUTION; SEQUENCE; UNIREF;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The established approach to unsupervised protein contact prediction estimates coevolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment. Increasingly large Transformers are being pretrained on unlabeled, unaligned protein sequence databases and showing competitive performance on protein contact prediction. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce an energy-based attention layer, factored attention, which, in a certain limit, recovers a Potts model, and use it to contrast Potts and Transformers. We show that the Transformer leverages hierarchical signal in protein family databases not captured by single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases.

引用

页码：34 / 45

页数：12

共 49 条

[1]

Ba J L., LAYER NORMALIZATION, DOI [DOI 10.48550/ARXIV.1607.06450, 10.48550/arXiv.1607.06450]

[2] Accurate prediction of protein structures and interactions using a three-track neural network [J].

Baek, Minkyung ;

DiMaio, Frank ;

Anishchenko, Ivan ;

Dauparas, Justas ;

Ovchinnikov, Sergey ;

Lee, Gyu Rie ;

Wang, Jue ;

Cong, Qian ;

Kinch, Lisa N. ;

Schaeffer, R. Dustin ;

Millan, Claudia ;

Park, Hahnbeom ;

Adams, Carson ;

Glassman, Caleb R. ;

DeGiovanni, Andy ;

Pereira, Jose H. ;

Rodrigues, Andria V. ;

van Dijk, Alberdina A. ;

Ebrecht, Ana C. ;

Opperman, Diederik J. ;

Sagmeister, Theo ;

Buhlheller, Christoph ;

Pavkov-Keller, Tea ;

Rathinaswamy, Manoj K. ;

Dalwadi, Udit ;

Yip, Calvin K. ;

Burke, John E. ;

Garcia, K. Christopher ;

Grishin, Nick V. ;

Adams, Paul D. ;

Read, Randy J. ;

Baker, David .

SCIENCE, 2021, 373 (6557) :871-+

[3] Learning generative models for protein fold families [J].

Balakrishnan, Sivaraman ;

Kamisetty, Hetunandan ;

Carbonell, Jaime G. ;

Lee, Su-In ;

Langmead, Christopher James .

PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2011, 79 (04) :1061-1078

[4]

Biewald L., 2020, Experiment Tracking with Weights and Biases

[5] Automated protein subfamily identification and classification [J].

Brown, Duncan P. ;

Krishnamurthy, Nandini ;

Sjoelander, Kimmen .

PLOS COMPUTATIONAL BIOLOGY, 2007, 3 (08) :1526-1538

[6]

Brown TB, 2020, ADV NEUR IN, V33

[7]

BURLEY SK, 1988, ADV PROTEIN CHEM, V39, P125

[8] SCOPe: classification of large macromolecular structures in the structural classification of proteinsextended database [J].

Chandonia, John-Marc ;

Fox, Naomi K. ;

Brenner, Steven E. .

NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D475-D481

[9]

Dallago C., 2021, FLIP: benchmark tasks in fitness landscape inference for proteins

[10]

Dauparas J., 2019, UNIFIED FRAMEWORK MO

← 1 2 3 4 5 →