NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning

被引:153
作者
Hoie, Magnus Haraldson [1 ]
Kiehl, Erik Nicolas [1 ]
Petersen, Bent [2 ,3 ]
Nielsen, Morten [1 ]
Winther, Ole [4 ,5 ,6 ]
Nielsen, Henrik [1 ]
Hallgren, Jeppe [7 ]
Marcatili, Paolo [1 ]
机构
[1] Tech Univ Denmark, Dept Hlth Technol, Lyngby, Denmark
[2] Univ Copenhagen, Ctr Evolutionary Hologen, GLOBE Inst, Copenhagen, Denmark
[3] AIMST Univ, Fac Appl Sci, Ctr Excellence Omics Driven Computat Biodiscovery, Bedong, Kedah, Malaysia
[4] Tech Univ Denmark DTU, Sect Cognit Syst, DTU Compute, Lyngby, Denmark
[5] Copenhagen Univ Hosp, Ctr Genom Med, Rigshosp, Copenhagen, Denmark
[6] Univ Copenhagen, Bioinformat Ctr, Dept Biol, Copenhagen, Denmark
[7] BioLib Technol, Copenhagen, Denmark
关键词
MULTIPLE SEQUENCE ALIGNMENT;
D O I
10.1093/nar/gkac439
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Recent advances in machine learning and natural language processing have made it possible to profoundly advance our ability to accurately predict protein structures and their functions. While such improvements are significantly impacting the fields of biology and biotechnology at large, such methods have the downside of high demands in terms of computing power and runtime, hampering their applicability to large datasets. Here, we present NetSurfP-3.0, a tool for predicting solvent accessibility, secondary structure, structural disorder and backbone dihedral angles for each residue of an amino acid sequence. This NetSurfP update exploits recent advances in pre-trained protein language models to drastically improve the runtime of its predecessor by two orders of magnitude, while displaying similar prediction performance. We assessed the accuracy of NetSurfP-3.0 on several independent test datasets and found it to consistently produce state-of-the-art predictions for each of its output features, with a runtime that is up to to 600 times faster than the most commonly available methods performing the same tasks. The tool is freely available as a web server with a user-friendly interface to navigate the results, as well as a standalone downloadable package.
引用
收藏
页码:W510 / W515
页数:6
相关论文
共 25 条
[1]   Assessment of hard target modeling in CASP12 reveals an emerging role of alignment-based contact prediction methods [J].
Abriata, Luciano A. ;
Tamo, Giorgio E. ;
Monastyrskyy, Bohdan ;
Kryshtafovych, Andriy ;
Dal Peraro, Matteo .
PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2018, 86 :97-112
[2]   UniProt: the universal protein knowledgebase in 2021 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Agivetova, Rahat ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Coetzee, Ray ;
Cukura, Austra ;
Da Silva, Alan ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lock, Antonia ;
Lopez, Rodrigo ;
Luciani, Aurelien ;
Luo, Jie ;
Lussi, Yvonne ;
Mac-Dougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Oliveira, Carla Susana ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Rice, Daniel ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sampson, Joseph .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D480-D489
[3]   BLAST plus : architecture and applications [J].
Camacho, Christiam ;
Coulouris, George ;
Avagyan, Vahram ;
Ma, Ning ;
Papadopoulos, Jason ;
Bealer, Kevin ;
Madden, Thomas L. .
BMC BIOINFORMATICS, 2009, 10
[4]  
Chowdhary K., 2020, P FUNDAM ARTIF INTEL, P603, DOI 10.1007/978-81-322-3972-7_19
[5]  
Cuff JA, 1999, PROTEINS, V34, P508, DOI 10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO
[6]  
2-4
[7]   MUSCLE: multiple sequence alignment with high accuracy and high throughput [J].
Edgar, RC .
NUCLEIC ACIDS RESEARCH, 2004, 32 (05) :1792-1797
[8]  
Elnaggar A., 2021, arXiv
[9]   Highly accurate protein structure prediction with AlphaFold [J].
Jumper, John ;
Evans, Richard ;
Pritzel, Alexander ;
Green, Tim ;
Figurnov, Michael ;
Ronneberger, Olaf ;
Tunyasuvunakool, Kathryn ;
Bates, Russ ;
Zidek, Augustin ;
Potapenko, Anna ;
Bridgland, Alex ;
Meyer, Clemens ;
Kohl, Simon A. A. ;
Ballard, Andrew J. ;
Cowie, Andrew ;
Romera-Paredes, Bernardino ;
Nikolov, Stanislav ;
Jain, Rishub ;
Adler, Jonas ;
Back, Trevor ;
Petersen, Stig ;
Reiman, David ;
Clancy, Ellen ;
Zielinski, Michal ;
Steinegger, Martin ;
Pacholska, Michalina ;
Berghammer, Tamas ;
Bodenstein, Sebastian ;
Silver, David ;
Vinyals, Oriol ;
Senior, Andrew W. ;
Kavukcuoglu, Koray ;
Kohli, Pushmeet ;
Hassabis, Demis .
NATURE, 2021, 596 (7873) :583-+
[10]   MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability [J].
Katoh, Kazutaka ;
Standley, Daron M. .
MOLECULAR BIOLOGY AND EVOLUTION, 2013, 30 (04) :772-780