Co-evolution Transformer for Protein Contact Prediction

被引:0
作者
Zhang, He [1 ]
Ju, Fusong [2 ]
Zhu, Jianwei [2 ]
He, Liang [2 ]
Shao, Bin [2 ]
Zheng, Nanning [1 ]
Liu, Tie-Yan [2 ]
机构
[1] Xi An Jiao Tong Univ, Xian, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Proteins are the main machinery of life and protein functions are largely determined by their 3D structures. The measurement of the pairwise proximity between amino acids of a protein, known as inter-residue contact map, well characterizes the structural information of a protein. Protein contact prediction (PCP) is an essential building block of many protein structure related applications. The prevalent approach to contact prediction is based on estimating the inter-residue contacts using hand-crafted coevolutionary features derived from multiple sequence alignments (MSAs). To mitigate the information loss caused by hand-crafted features, some recently proposed methods try to learn residue co-evolutions directly from MSAs. These methods generally derive coevolutionary features by aggregating the learned residue representations from individual sequences with equal weights, which is inconsistent with the premise that residue co-evolutions are a reflection of collective covariation patterns of numerous homologous proteins. Moreover, non-homologous residues and gaps commonly exist in MSAs. By aggregating features from all homologs equally, the non-homologous information may cause misestimation of the residue co-evolutions. To overcome these issues, we propose an attention-based architecture, Co-evolution Transformer (CoT), for PCP. CoT jointly considers the information from all homologous sequences in the MSA to better capture global coevolutionary patterns. To mitigate the influence of the nonhomologous information, CoT selectively aggregates the features from different homologs by assigning smaller weights to non-homologous sequences or residue pairs. Extensive experiments on two rigorous benchmark datasets demonstrate the effectiveness of CoT. In particular, CoT achieves a 51:6% top-L long-range precision score for the Free Modeling (FM) domains on the CASP14 benchmark, which outperforms the winner group of CASP14 contact prediction challenge by 9:8%
引用
收藏
页数:12
相关论文
共 40 条
  • [1] Anishchenko Ivan, 2020, DE NOVO PROTEIN DESI
  • [2] [Anonymous], 2021, MSA TRANSFORMER
  • [3] Ba J., 2016, ARXIV160706450, V1050, P21
  • [4] Learning generative models for protein fold families
    Balakrishnan, Sivaraman
    Kamisetty, Hetunandan
    Carbonell, Jaime G.
    Lee, Su-In
    Langmead, Christopher James
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2011, 79 (04) : 1061 - 1078
  • [5] RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy
    Burley, Stephen K.
    Berman, Helen M.
    Bhikadiya, Charmi
    Bi, Chunxiao
    Chen, Li
    Di Costanzo, Luigi
    Christie, Cole
    Dalenberg, Ken
    Duarte, Jose M.
    Dutta, Shuchismita
    Feng, Zukang
    Ghosh, Sutapa
    Goodsell, David S.
    Green, Rachel K.
    Guranovic, Vladimir
    Guzenko, Dmytro
    Hudson, Brian P.
    Kalro, Tara
    Liang, Yuhe
    Lowe, Robert
    Namkoong, Harry
    Peisach, Ezra
    Periskova, Irina
    Prlic, Andreas
    Randle, Chris
    Rose, Alexander
    Rose, Peter
    Sala, Raul
    Sekharan, Monica
    Shao, Chenghua
    Tan, Lihua
    Tao, Yi-Ping
    Valasatava, Yana
    Voigt, Maria
    Westbrook, John
    Woo, Jesse
    Yang, Huanwang
    Young, Jasmine
    Zhuravleva, Marina
    Zardecki, Christine
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D464 - D474
  • [6] Chorowski J, 2015, ADV NEUR IN, V28
  • [7] Improved contact prediction in proteins: Using pseudolikelihoods to infer Potts models
    Ekeberg, Magnus
    Lovkvist, Cecilia
    Lan, Yueheng
    Weigt, Martin
    Aurell, Erik
    [J]. PHYSICAL REVIEW E, 2013, 87 (01)
  • [8] Elnaggar A., 2021, IEEE T PATTERN ANAL, V14, DOI 10.1109/tpami.2021.3095381
  • [9] Video Action Transformer Network
    Girdhar, Rohit
    Carreira, Joao
    Doersch, Carl
    Zisserman, Andrew
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 244 - 253
  • [10] Continuous Automated Model EvaluatiOn (CAMEO) complementing the critical assessment of structure prediction in CASP12
    Haas, Jurgen
    Barbato, Alessandro
    Behringer, Dario
    Studer, Gabriel
    Roth, Steven
    Bertoni, Martino
    Mostaguir, Khaled
    Gumienny, Rafal
    Schwede, Torsten
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2018, 86 : 387 - 398