InterPro in 2019: improving coverage, classification and access to protein sequence annotations

被引:1057
作者
Mitchell, Alex L. [1 ]
Attwood, Teresa K. [2 ]
Babbitt, Patricia C. [3 ]
Blum, Matthias [1 ]
Bork, Peer [4 ]
Bridge, Alan [5 ]
Brown, Shoshana D. [3 ]
Chang, Hsin-Yu [1 ]
El-Gebali, Sara [1 ]
Fraser, Matthew I. [1 ]
Gough, Julian [6 ]
Haft, David R. [7 ]
Huang, Hongzhan [8 ]
Letunic, Ivica [9 ]
Lopez, Rodrigo [1 ]
Luciani, Aurelien [1 ]
Madeira, Fabio [1 ]
Marchler-Bauer, Aron [10 ]
Mi, Huaiyu [11 ]
Natale, Darren A. [12 ]
Necci, Marco [13 ,14 ,15 ]
Nuka, Gift [1 ]
Orengo, Christine [16 ]
Pandurangan, Arun P. [6 ]
Paysan-Lafosse, Typhaine [1 ]
Pesseat, Sebastien [1 ]
Potter, Simon C. [1 ]
Qureshi, Matloob A. [1 ]
Rawlings, Neil D. [1 ]
Redaschi, Nicole [5 ]
Richardson, Lorna J. [1 ]
Rivoire, Catherine [5 ]
Salazar, Gustavo A. [1 ]
Sangrador-Vegas, Amaia [1 ]
Sigrist, Christian J. A. [5 ]
Sillitoe, Ian [16 ]
Sutton, Granger G. [7 ]
Thanki, Narmada [10 ]
Thomas, Paul D. [11 ]
Tosatto, Silvio C. E. [13 ]
Yong, Siew-Yit [1 ]
Finn, Robert D. [1 ]
机构
[1] EMBL EBI, Wellcome Trust Genome Campus, Cambridge CB10 1SD, England
[2] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, Lancs, England
[3] Univ Calif San Francisco, Dept Bioengn & Therapeut Sci, San Francisco, CA 94158 USA
[4] European Mol Biol Lab, Struct & Computat Biol Unit, Meyerhofstr 1, D-69117 Heidelberg, Germany
[5] CMU, SIB Swiss Inst Bioinformat, Swiss Prot Grp, 1 Rue Michel Servet, CH-1211 Geneva 4, Switzerland
[6] MRC, Lab Mol Biol, Francis Crick Ave,Cambridge Biomed Campus, Cambridge CB2 0QH, England
[7] JCVI, 9605 Med Ctr Dr,Suite 150, Rockville, MD 20850 USA
[8] Univ Delaware, Ctr Bioinformat & Computat Biol, Newark, DE USA
[9] Biobyte Solut GmbH, Bothestr 142, D-69126 Heidelberg, Germany
[10] Natl Lib Med, Natl Ctr Biotechnol Informat, NIH Bldg,38A,8600 Rockville Pike, Bethesda, MD 20894 USA
[11] Univ Southern Calif, Dept Prevent Med, Div Bioinformat, Los Angeles, CA 90033 USA
[12] Georgetown Univ, Med Ctr, Protein Informat Resource, Washington, DC 20007 USA
[13] Univ Padua, Dept Biomed Sci, Via U Bassi 58b, I-35131 Padua, Italy
[14] Univ Udine, Dept Agr Sci, Via Palladio 8, I-33100 Udine, Italy
[15] Fdn Edmund Mach, Via E Mach 1, I-38010 San Michele All Adige, Italy
[16] UCL, Struct & Mol Biol, Darwin Bldg, London WC1E 6BT, England
基金
英国生物技术与生命科学研究理事会; 美国国家科学基金会; 英国惠康基金; 美国国家卫生研究院;
关键词
FAMILY CLASSIFICATION; GENE ONTOLOGY; DATABASE; PREDICTION; TOPOLOGY; TOOL;
D O I
10.1093/nar/gky1100
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.
引用
收藏
页码:D351 / D360
页数:10
相关论文
共 40 条
  • [1] The Ensembl gene annotation system
    Aken, Bronwen L.
    Ayling, Sarah
    Barrell, Daniel
    Clarke, Laura
    Curwen, Valery
    Fairley, Susan
    Banet, Julio Fernandez
    Billis, Konstantinos
    Giron, Carlos Garcia
    Hourlier, Thibaut
    Howe, Kevin
    Kahari, Andreas
    Kokocinski, Felix
    Martin, Fergal J.
    Murphy, Daniel N.
    Nag, Rishi
    Ruffier, Magali
    Schuster, Michael
    Tang, Y. Amy
    Vogel, Jan-Hinnerk
    White, Simon
    Zadissa, Amonida
    Flicek, Paul
    Searle, Stephen M. J.
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2016,
  • [2] The Structure-Function Linkage Database
    Akiva, Eyal
    Brown, Shoshana
    Almonacid, Daniel E.
    Barber, Alan E., II
    Custer, Ashley F.
    Hicks, Michael A.
    Huang, Conrad C.
    Lauck, Florian
    Mashiyama, Susan T.
    Meng, Elaine C.
    Mischel, David
    Morris, John H.
    Ojha, Sunil
    Schnoes, Alexandra M.
    Stryke, Doug
    Yunes, Jeffrey M.
    Ferrin, Thomas E.
    Holliday, Gemma L.
    Babbitt, Patricia C.
    [J]. NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) : D521 - D530
  • [3] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [4] The PRINTS database: a fine-grained protein sequence annotation and analysis resource-its status in 2012
    Attwood, Teresa K.
    Coletta, Alain
    Muirhead, Gareth
    Pavlopoulou, Athanasia
    Philippou, Peter B.
    Popov, Ivan
    Roma-Mateo, Carlos
    Theodosiou, Athina
    Mitchell, Alex L.
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2012,
  • [5] UniProt: the universal protein knowledgebase
    Bateman, Alex
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Alpi, Emanuele
    Antunes, Ricardo
    Bely, Benoit
    Bingley, Mark
    Bonilla, Carlos
    Britto, Ramona
    Bursteinas, Borisas
    Bye-A-Jee, Hema
    Cowley, Andrew
    Da Silva, Alan
    De Giorgi, Maurizio
    Dogan, Tunca
    Fazzini, Francesco
    Castro, Leyla Garcia
    Figueira, Luis
    Garmiri, Penelope
    Georghiou, George
    Gonzalez, Daniel
    Hatton-Ellis, Emma
    Li, Weizhong
    Liu, Wudong
    Lopez, Rodrigo
    Luo, Jie
    Lussi, Yvonne
    MacDougall, Alistair
    Nightingale, Andrew
    Palka, Barbara
    Pichler, Klemens
    Poggioli, Diego
    Pundir, Sangya
    Pureza, Luis
    Qi, Guoying
    Rosanoff, Steven
    Saidi, Rabie
    Sawford, Tony
    Shypitsyna, Aleksandra
    Speretta, Elena
    Turner, Edward
    Tyagi, Nidhi
    Volynkin, Vladimir
    Wardell, Tony
    Warner, Kate
    Watkins, Xavier
    Zaru, Rossana
    Zellner, Hermann
    Xenarios, Ioannis
    [J]. NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) : D158 - D169
  • [6] The ProDom database of protein domain families: more emphasis on 3D
    Bru, C
    Courcelle, E
    Carrre, S
    Beausse, Y
    Dalmar, S
    Kahn, D
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 : D212 - D215
  • [7] Cesare S., 2012, Software Similarity and Classification
  • [8] Conesa Ana, 2008, Int J Plant Genomics, V2008, P619832, DOI 10.1155/2008/619832
  • [9] Relating sequence encoded information to form and function of intrinsically disordered proteins
    Das, Rahul K.
    Ruff, Kiersten M.
    Pappu, Rohit V.
    [J]. CURRENT OPINION IN STRUCTURAL BIOLOGY, 2015, 32 : 102 - 112
  • [10] Conformations of intrinsically disordered proteins are influenced by linear sequence distributions of oppositely charged residues
    Das, Rahul K.
    Pappu, Rohit V.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (33) : 13392 - 13397