POLAT: Protein function prediction based on soft mask graph network and residue-Label ATtention

被引:2
作者
Liu, Yang [1 ]
Zhang, Yi [1 ]
Chen, Zihao
Peng, Jing [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp & Artificial Intelligence, Intelligent Bioinformat Lab, Wuhan 430070, Peoples R China
关键词
Protein function prediction; Gene Ontology; Protein contact map; Graph Neural Network; SEQUENCE; ONTOLOGY;
D O I
10.1016/j.compbiolchem.2024.108064
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Motivation: Elucidating protein function is a central problem in biochemistry, genetics, and molecular biology. Developing computational methods for protein function prediction is critical due to the significant gap between sequence and functional data. Recent advances in protein structure prediction, which strongly correlates with function, make it feasible to use structure to predict function. However, current structure -based methods overlook the fact that individual residues may contribute differently to the protein's function and do not take into account the correlation between protein residues and their functions. The challenge of effectively utilizing the relationship between protein residues and function -level information to predict protein function remains unsolved. Result: We proposed a protein function prediction method based on Soft Mask Graph Networks and ResidueLabel Attention (POLAT), which could combine sequence features, predicted structure features, and functionlevel information to get an accurate prediction. We use soft mask graph networks to adaptively extract the residues relevant to functions. A residue -label attention mechanism is adopted to obtain the protein -level encoded features of a protein, which are then concatenated with a protein -level embedding and fed into a dense classifier to determine the probabilities of each function. POLAT achieves 0.670, 0.515, 0.578 Fmax and 0.677, 0.409, 0.507 AUPR on the PDB cdhit test set for the MFO, BPO, and CCO domains, respectively, outperforming the existing structure -based SOTA method GAT -GO (Fmax 0.633, 0.492, 0.547; AUPR 0.660, 0.381, 0.479). POLAT is also competitive in extensive experiments among sequence -based and multimodal methods and achieves the SOTA performance in three out of six metrics.
引用
收藏
页数:8
相关论文
共 46 条
[1]   Network analysis of protein structures identifies functional residues [J].
Amitai, G ;
Shemesh, A ;
Sitbon, E ;
Shklar, M ;
Netanely, D ;
Venger, I ;
Pietrokovski, S .
JOURNAL OF MOLECULAR BIOLOGY, 2004, 344 (04) :1135-1146
[2]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[3]   Accurate prediction of protein structures and interactions using a three-track neural network [J].
Baek, Minkyung ;
DiMaio, Frank ;
Anishchenko, Ivan ;
Dauparas, Justas ;
Ovchinnikov, Sergey ;
Lee, Gyu Rie ;
Wang, Jue ;
Cong, Qian ;
Kinch, Lisa N. ;
Schaeffer, R. Dustin ;
Millan, Claudia ;
Park, Hahnbeom ;
Adams, Carson ;
Glassman, Caleb R. ;
DeGiovanni, Andy ;
Pereira, Jose H. ;
Rodrigues, Andria V. ;
van Dijk, Alberdina A. ;
Ebrecht, Ana C. ;
Opperman, Diederik J. ;
Sagmeister, Theo ;
Buhlheller, Christoph ;
Pavkov-Keller, Tea ;
Rathinaswamy, Manoj K. ;
Dalwadi, Udit ;
Yip, Calvin K. ;
Burke, John E. ;
Garcia, K. Christopher ;
Grishin, Nick V. ;
Adams, Paul D. ;
Read, Randy J. ;
Baker, David .
SCIENCE, 2021, 373 (6557) :871-+
[4]   UniProt: a worldwide hub of protein knowledge [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Alpi, Emanuele ;
Bely, Benoit ;
Bingley, Mark ;
Britto, Ramona ;
Bursteinas, Borisas ;
Busiello, Gianluca ;
Bye-A-Jee, Hema ;
Da Silva, Alan ;
De Giorgi, Maurizio ;
Dogan, Tunca ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Daniel ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Ignatchenko, Alexandr ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lopez, Rodrigo ;
Luo, Jie ;
Lussi, Yvonne ;
MacDougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Nightingale, Andrew ;
Onwubiko, Joseph ;
Palka, Barbara ;
Pichler, Klemens ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Renaux, Alexandre ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sawford, Tony ;
Shypitsyna, Aleksandra ;
Speretta, Elena ;
Turner, Edward ;
Tyagi, Nidhi ;
Vasudev, Preethi ;
Volynkin, Vladimir ;
Wardell, Tony .
NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) :D506-D515
[5]  
Brenner SE, 1996, METHOD ENZYMOL, V266, P635
[6]   Fast and sensitive protein alignment using DIAMOND [J].
Buchfink, Benjamin ;
Xie, Chao ;
Huson, Daniel H. .
NATURE METHODS, 2015, 12 (01) :59-60
[7]   TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding [J].
Cao, Yue ;
Shen, Yang .
BIOINFORMATICS, 2021, 37 (18) :2825-2833
[8]   CATH: an expanded resource to predict protein function through structure and sequence [J].
Dawson, Natalie L. ;
Lewis, Tony E. ;
Das, Sayoni ;
Lees, Jonathan G. ;
Lee, David ;
Ashford, Paul ;
Orengo, Christine A. ;
Sillitoe, Ian .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D289-D295
[9]  
DeepMind and EMNL-EBI, 2022, AlphaFold protein structure database
[10]  
EMNL-EBI, 2022, CHEBI:33708 - amino-acid residue