findMySequence: a neural-network-based approach for identification of unknown proteins in X-ray crystallography and cryo-EM

被引:39
作者
Chojnowski, Grzegorz [1 ]
Simpkin, Adam J. [2 ]
Leonardo, Diego A. [3 ]
Seifert-Davila, Wolfram [4 ]
Vivas-Ruiz, Dan E. [5 ]
Keegan, Ronan M. [6 ]
Rigden, Daniel J. [2 ]
机构
[1] European Mol Biol Lab, Hamburg Unit, Notkestr 85, D-22607 Hamburg, Germany
[2] Univ Liverpool, Inst Syst Mol & Integrat Biol, Liverpool L69 7ZB, Merseyside, England
[3] Univ Sao Paulo, Sao Carlos Inst Phys, Ave Joao Dagnone 1100, BR-13563120 Sao Carlos, SP, Brazil
[4] European Mol Biol Lab, Meyerhofstr 1, D-69117 Heidelberg, Germany
[5] Univ Nacl Mayor San Marcos, Fac Ciencias Biol, Lab Biologla Mol, Ave Venezuela Cdra 34 S-N,Ciudad Univ, Lima, Peru
[6] Rutherford Appleton Lab, UKRI STFC, Res Complex Harwell, Didcot OX11 0FA, Oxon, England
基金
英国生物技术与生命科学研究理事会;
关键词
protein structures; protein sequences; SIMBAD; cryo-EM; bioinformatics; structure determination; findMySequence; neural networks; STENOTROPHOMONAS-MALTOPHILIA; ANGSTROM RESOLUTION; SEQUENCE ASSIGNMENT; CRYSTAL-STRUCTURE; REFINEMENT; BINDING; NEUTRALIZATION; COMPLEX; SEARCH; VENOMS;
D O I
10.1107/S2052252521011088
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Although experimental protein-structure determination usually targets known proteins, chains of unknown sequence are often encountered. They can be purified from natural sources, appear as an unexpected fragment of a well characterized protein or appear as a contaminant. Regardless of the source of the problem, the unknown protein always requires characterization. Here, an automated pipeline is presented for the identification of protein sequences from cryo-EM reconstructions and crystallographic data. The method's application to characterize the crystal structure of an unknown protein purified from a snake venom is presented. It is also shown that the approach can be successfully applied to the identification of protein sequences and validation of sequence assignments in cryo-EM protein structures.
引用
收藏
页码:86 / +
页数:15
相关论文
共 70 条
[1]   Molecular replacement: tricks and treats [J].
Abergel, Chantal .
ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2013, 69 :2167-2173
[2]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[3]   Molecular mechanisms underlying intraspecific variation in snake venom [J].
Amazonas, Diana R. ;
Portes-Junior, Jose A. ;
Nishiyama-Jr, Milton Y. ;
Nicolau, Carolina A. ;
Chalkidis, Hipocrates M. ;
Mourao, Rosa H., V ;
Grazziotin, Felipe G. ;
Rokyta, Darin R. ;
Lisle Gibbs, H. ;
Valente, Richard H. ;
Junqueira-de-Azevedo, Inacio L. M. ;
Moura-da-Silva, Ana M. .
JOURNAL OF PROTEOMICS, 2018, 181 :60-72
[4]   UniProt: the universal protein knowledgebase in 2021 [J].
Bateman, Alex ;
Martin, Maria-Jesus ;
Orchard, Sandra ;
Magrane, Michele ;
Agivetova, Rahat ;
Ahmad, Shadab ;
Alpi, Emanuele ;
Bowler-Barnett, Emily H. ;
Britto, Ramona ;
Bursteinas, Borisas ;
Bye-A-Jee, Hema ;
Coetzee, Ray ;
Cukura, Austra ;
Da Silva, Alan ;
Denny, Paul ;
Dogan, Tunca ;
Ebenezer, ThankGod ;
Fan, Jun ;
Castro, Leyla Garcia ;
Garmiri, Penelope ;
Georghiou, George ;
Gonzales, Leonardo ;
Hatton-Ellis, Emma ;
Hussein, Abdulrahman ;
Ignatchenko, Alexandr ;
Insana, Giuseppe ;
Ishtiaq, Rizwan ;
Jokinen, Petteri ;
Joshi, Vishal ;
Jyothi, Dushyanth ;
Lock, Antonia ;
Lopez, Rodrigo ;
Luciani, Aurelien ;
Luo, Jie ;
Lussi, Yvonne ;
Mac-Dougall, Alistair ;
Madeira, Fabio ;
Mahmoudy, Mahdi ;
Menchi, Manuela ;
Mishra, Alok ;
Moulang, Katie ;
Nightingale, Andrew ;
Oliveira, Carla Susana ;
Pundir, Sangya ;
Qi, Guoying ;
Raj, Shriya ;
Rice, Daniel ;
Lopez, Milagros Rodriguez ;
Saidi, Rabie ;
Sampson, Joseph .
NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) :D480-D489
[5]   iMOSFLM: a new graphical interface for diffraction-image processing with MOSFLM [J].
Battye, T. Geoff G. ;
Kontogiannis, Luke ;
Johnson, Owen ;
Powell, Harold R. ;
Leslie, Andrew G. W. .
ACTA CRYSTALLOGRAPHICA SECTION D-BIOLOGICAL CRYSTALLOGRAPHY, 2011, 67 :271-281
[6]   Structure of the mycobacterial ESX-5 type VII secretion system pore complex [J].
Beckham, Katherine S. H. ;
Ritter, Christina ;
Chojnowski, Grzegorz ;
Ziemianowicz, Daniel S. ;
Mullapudi, Edukondalu ;
Rettel, Mandy ;
Savitski, Mikhail M. ;
Mortensen, Simon A. ;
Kosinski, Jan ;
Wilmanns, Matthias .
SCIENCE ADVANCES, 2021, 7 (26)
[7]   The Protein Data Bank [J].
Berman, HM ;
Westbrook, J ;
Feng, Z ;
Gilliland, G ;
Bhat, TN ;
Weissig, H ;
Shindyalov, IN ;
Bourne, PE .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :235-242
[8]   Structures of actin-like ParM filaments show architecture of plasmid-segregating spindles [J].
Bharat, Tanmay A. M. ;
Murshudov, Garib N. ;
Sachse, Carsten ;
Loewe, Jan .
NATURE, 2015, 523 (7558) :106-+
[9]   Room-temperature serial crystallography at synchrotron X-ray sources using slowly flowing free-standing high-viscosity microstreams [J].
Botha, Sabine ;
Nass, Karol ;
Barends, Thomas R. M. ;
Kabsch, Wolfgang ;
Latz, Beatrice ;
Dworkowski, Florian ;
Foucar, Lutz ;
Panepucci, Ezequiel ;
Wang, Meitian ;
Shoeman, Robert L. ;
Schlichting, Ilme ;
Doak, R. Bruce .
ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2015, 71 :387-397
[10]   Tools for macromolecular model building and refinement into electron cryo-microscopy reconstructions [J].
Brown, Alan ;
Long, Fei ;
Nicholls, Robert A. ;
Toots, Jaan ;
Emsley, Paul ;
Murshudov, Garib .
ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2015, 71 :136-153