Enzyme function prediction using contrastive learning

被引:147
作者
Yu, Tianhao [1 ,2 ,3 ]
Cui, Haiyang [1 ,2 ,3 ]
Li, Jianan Canal [3 ,4 ]
Luo, Yunan [5 ]
Jiang, Guangde [1 ]
Zhao, Huimin [1 ,2 ,3 ,6 ]
机构
[1] Univ Illinois, Dept Chem & Biomol Engn, Urbana, IL 61801 USA
[2] Univ Illinois, Carl R Woese Inst Genom Biol, Urbana, IL 61801 USA
[3] Univ Illinois, Natl Sci Fdn Mol Maker Lab Inst, Urbana, IL 61801 USA
[4] Cornell Univ, Dept Comp Sci, Ithaca, NY 14850 USA
[5] Georgia Inst Technol, Sch Computat Sci & Engn, Atlanta, GA 30308 USA
[6] Univ Illinois, US Dept Energy Ctr Adv Bioenergy & Bioprod Innovat, Urbana, IL 61801 USA
基金
美国国家科学基金会;
关键词
CRYSTAL-STRUCTURE; PROTEIN; BACTERIAL; MECHANISM; EVOLUTION; FAMILIES; COFACTOR; SEQUENCE; BIOLOGY;
D O I
10.1126/science.adf2465
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Enzyme function annotation is a fundamental challenge, and numerous computational tools have been developed. However, most of these tools cannot accurately predict functional annotations, such as enzyme commission (EC) number, for less-studied proteins or those with previously uncharacterized functions or multiple activities. We present a machine learning algorithm named CLEAN (contrastive learning-enabled enzyme annotation) to assign EC numbers to enzymes with better accuracy, reliability, and sensitivity compared with the state-of-the-art tool BLASTp. The contrastive learning framework empowers CLEAN to confidently (i) annotate understudied enzymes, (ii) correct mislabeled enzymes, and (iii) identify promiscuous enzymes with two or more EC numbers-functions that we demonstrate by systematic in silico and in vitro experiments. We anticipate that this tool will be widely used for predicting the functions of uncharacterized enzymes, thereby advancing many fields, such as genomics, synthetic biology, and biocatalysis.
引用
收藏
页码:1358 / +
页数:6
相关论文
共 43 条
[1]   Enzymatic Halogenation and Dehalogenation Reactions: Pervasive and Mechanistically Diverse [J].
Agarwal, Vinayak ;
Miles, Zachary D. ;
Winter, Jaclyn M. ;
Eustaquio, Alessandra S. ;
El Gamal, Abrahim A. ;
Moore, Bradley S. .
CHEMICAL REVIEWS, 2017, 117 (08) :5619-5674
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[4]   UniProt: the universal protein knowledgebase in 2021 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Agivetova, Rahat ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Coetzee, Ray ;
Cukura, Austra ;
Da Silva, Alan ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lock, Antonia ;
Lopez, Rodrigo ;
Luciani, Aurelien ;
Luo, Jie ;
Lussi, Yvonne ;
Mac-Dougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Oliveira, Carla Susana ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Rice, Daniel ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sampson, Joseph .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D480-D489
[5]   The InterPro protein families and domains database: 20 years on [J].
Blum, Matthias ;
Chang, Hsin-Yu ;
Chuguransky, Sara ;
Grego, Tiago ;
Kandasaamy, Swaathi ;
Mitchell, Alex ;
Nuka, Gift ;
Paysan-Lafosse, Typhaine ;
Qureshi, Matloob ;
Raj, Shriya ;
Richardson, Lorna ;
Salazar, Gustavo A. ;
Williams, Lowri ;
Bork, Peer ;
Bridge, Alan ;
Gough, Julian ;
Haft, Daniel H. ;
Letunic, Ivica ;
Marchler-Bauer, Aron ;
Mi, Huaiyu ;
Natale, Darren A. ;
Necci, Marco ;
Orengo, Christine A. ;
Pandurangan, Arun P. ;
Rivoire, Catherine ;
Sigrist, Christian J. A. ;
Sillitoe, Ian ;
Thanki, Narmada ;
Thomas, Paul D. ;
Tosatto, Silvio C. E. ;
Wu, Cathy H. ;
Bateman, Alex ;
Finn, Robert D. .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D344-D354
[6]   Halogenases: a palette of emerging opportunities for synthetic biology-synthetic chemistry and C-H functionalisation [J].
Crowe, Charlotte ;
Molyneux, Samuel ;
Sharma, Sunil V. ;
Zhang, Ying ;
Gkotsi, Danai S. ;
Connaris, Helen ;
Goss, Rebecca J. M. .
CHEMICAL SOCIETY REVIEWS, 2021, 50 (17) :9443-9481
[7]   ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature [J].
Dalkiran, Alperen ;
Rifaioglu, Ahmet Sureyya ;
Martin, Maria Jesus ;
Cetin-Atalay, Rengul ;
Atalay, Volkan ;
Dogan, Tunca .
BMC BIOINFORMATICS, 2018, 19
[8]  
Desai Dhwani K., 2011, Advances in Bioinformatics, V2011, P743782, DOI 10.1155/2011/743782
[9]   Crystal structure and mechanism of a bacterial fluorinating enzyme [J].
Dong, CJ ;
Huang, FL ;
Deng, H ;
Schaffrath, C ;
Spencer, JB ;
O'Hagan, D ;
Naismith, JH .
NATURE, 2004, 427 (6974) :561-565
[10]   S-Adenosyl-L-Methionine Hydrolase (Adenosine-Forming), a Conserved Bacterial and Archeal Protein Related to SAM-Dependent Halogenases [J].
Eustaquio, Alessandra S. ;
Haerle, Johannes ;
Noel, Joseph P. ;
Moore, Bradley S. .
CHEMBIOCHEM, 2008, 9 (14) :2215-2219