Consequences of the discontinuation of the International Protein Index (IPI) database and its substitution by the UniProtKB "complete proteome" sets

被引:25
作者
Griss, Johannes [1 ,2 ]
Martin, Maria [1 ]
O'Donovan, Claire [1 ]
Apweiler, Rolf [1 ]
Hermjakob, Henning [1 ]
Vizcaino, Juan Antonio [1 ]
机构
[1] EMBL European Bioinformat Inst, Cambridge CB10 1SD, England
[2] Med Univ Vienna, Dept Med 1, Vienna, Austria
基金
英国惠康基金;
关键词
Bioinformatics; Discontinuation; Gene annotation; International Protein Index; Protein databases; UniProt Knowledgebase; RESOURCE;
D O I
10.1002/pmic.201100363
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The International Protein Index (IPI) database has been one of the most widely used protein databases in MS proteomics approaches. Recently, the closure of IPI in September 2011 was announced. Its recommended replacement is the new UniProt Knowledgebase (UniProtKB) "complete proteome" sets, launched in May 2011. Here, we analyze the consequences of IPI's discontinuation for human and mouse data, and the effect of its substitution with UniProtKB on two levels: (i) data already produced and (ii) newly performed experiments. To estimate the effect on existing data, we investigated how well IPI identifiers map to UniProtKB accessions. We found that 21% of human and 10% of mouse identifiers do not map to UniProtKB and would thus be "lost." To investigate the impact on new experiments, we compared the theoretical search space (i. e. the tryptic peptides) of both resources and found that it is decreased by 14.0% for human and 8.9% for mouse data through IPI's closure. An analysis on the experimental evidence for these "lost" peptides showed that the vast majority has not been identified in experiments available in the major proteomics repositories. It thus seems likely that the search space provided by UniProtKB is of higher quality than the one currently provided by IPI.
引用
收藏
页码:4434 / 4438
页数:5
相关论文
共 16 条
  • [1] Ongoing and future developments at the Universal Protein Resource
    Apweiler, Rolf
    Martin, Maria Jesus
    O'Donovan, Claire
    Magrane, Michele
    Alam-Faruque, Yasmin
    Antunes, Ricardo
    Barrell, Daniel
    Bely, Benoit
    Bingley, Mark
    Binns, David
    Bower, Lawrence
    Browne, Paul
    Chan, Wei Mun
    Dimmer, Emily
    Eberhardt, Ruth
    Fazzini, Francesco
    Fedotov, Alexander
    Foulger, Rebecca
    Garavelli, John
    Castro, Leyla Garcia
    Huntley, Rachael
    Jacobsen, Julius
    Kleen, Michael
    Laiho, Kati
    Legge, Duncan
    Lin, Quan
    Liu, Wudong
    Luo, Jie
    Orchard, Sandra
    Patient, Samuel
    Pichler, Klemens
    Poggioli, Diego
    Pontikos, Nikolas
    Pruess, Manuela
    Rosanoff, Steven
    Sawford, Tony
    Sehra, Harminder
    Turner, Edward
    Corbett, Matt
    Donnelly, Mike
    van Rensburg, Pieter
    Xenarios, Ioannis
    Bougueleret, Lydie
    Auchincloss, Andrea
    Argoud-Puy, Ghislaine
    Axelsen, Kristian
    Bairoch, Amos
    Baratin, Delphine
    Blatter, Marie-Claude
    Boeckmann, Brigitte
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : D214 - D219
  • [2] Bell AW, 2009, NAT METHODS, V6, P423, DOI [10.1038/NMETH.1333, 10.1038/nmeth.1333]
  • [3] The Protein Identifier Cross-Referencing (PICR) service:: reconciling protein identifiers across multiple source databases
    Cote, Richard G.
    Jones, Philip
    Martens, Lennart
    Kerrien, Samuel
    Reisinger, Florian
    Lin, Quan
    Leinonen, Rasko
    Apweiler, Rolf
    Hermjakob, Henning
    [J]. BMC BIOINFORMATICS, 2007, 8 (1) : 401
  • [4] Open source system for analyzing, validating, and storing protein identification data
    Craig, R
    Cortens, JP
    Beavis, RC
    [J]. JOURNAL OF PROTEOME RESEARCH, 2004, 3 (06) : 1234 - 1242
  • [5] PeptideAtlas: a resource for target selection for emerging targeted proteomics workflows
    Deutsch, Eric W.
    Lam, Henry
    Aebersold, Ruedi
    [J]. EMBO REPORTS, 2008, 9 (05) : 429 - 434
  • [6] Ensembl 2011
    Flicek, Paul
    Amode, M. Ridwan
    Barrell, Daniel
    Beal, Kathryn
    Brent, Simon
    Chen, Yuan
    Clapham, Peter
    Coates, Guy
    Fairley, Susan
    Fitzgerald, Stephen
    Gordon, Leo
    Hendrix, Maurice
    Hourlier, Thibaut
    Johnson, Nathan
    Kaehaeri, Andreas
    Keefe, Damian
    Keenan, Stephen
    Kinsella, Rhoda
    Kokocinski, Felix
    Kulesha, Eugene
    Larsson, Pontus
    Longden, Ian
    McLaren, William
    Overduin, Bert
    Pritchard, Bethan
    Riat, Harpreet Singh
    Rios, Daniel
    Ritchie, Graham R. S.
    Ruffier, Magali
    Schuster, Michael
    Sobral, Daniel
    Spudich, Giulietta
    Tang, Y. Amy
    Trevanion, Stephen
    Vandrovcova, Jana
    Vilella, Albert J.
    White, Simon
    Wilder, Steven P.
    Zadissa, Amonida
    Zamora, Jorge
    Aken, Bronwen L.
    Birney, Ewan
    Cunningham, Fiona
    Dunham, Ian
    Durbin, Richard
    Fernandez-Suarez, Xose M.
    Herrero, Javier
    Hubbard, Tim J. P.
    Parker, Anne
    Proctor, Glenn
    [J]. NUCLEIC ACIDS RESEARCH, 2011, 39 : D800 - D806
  • [7] Published and Perished? The Influence of the Searched Protein Database on the Long-Term Storage of Proteomics Data
    Griss, Johannes
    Cote, Richard G.
    Gerner, Christopher
    Hermjakob, Henning
    Vizcaino, Juan Antonio
    [J]. MOLECULAR & CELLULAR PROTEOMICS, 2011, 10 (09)
  • [8] The International Protein Index: An integrated database for proteomics experiments
    Kersey, PJ
    Duarte, J
    Williams, A
    Karavidopoulou, Y
    Birney, E
    Apweiler, R
    [J]. PROTEOMICS, 2004, 4 (07) : 1985 - 1988
  • [9] The Human Proteome Project: Current State and Future Direction
    Legrain, Pierre
    Aebersold, Ruedi
    Archakov, Alexander
    Bairoch, Amos
    Bala, Kumar
    Beretta, Laura
    Bergeron, John
    Borchers, Christoph H.
    Corthals, Garry L.
    Costello, Catherine E.
    Deutsch, Eric W.
    Domon, Bruno
    Hancock, William
    He, Fuchu
    Hochstrasser, Denis
    Marko-Varga, Gyorgy
    Salekdeh, Ghasem Hosseini
    Sechi, Salvatore
    Snyder, Michael
    Srivastava, Sudhir
    Uhlen, Mathias
    Wu, Cathy H.
    Yamamoto, Tadashi
    Paik, Young-Ki
    Omenn, Gilbert S.
    [J]. MOLECULAR & CELLULAR PROTEOMICS, 2011, 10 (07)
  • [10] The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes
    Pruitt, Kim D.
    Harrow, Jennifer
    Harte, Rachel A.
    Wallin, Craig
    Diekhans, Mark
    Maglott, Donna R.
    Searle, Steve
    Farrell, Catherine M.
    Loveland, Jane E.
    Ruef, Barbara J.
    Hart, Elizabeth
    Suner, Marie-Marthe
    Landrum, Melissa J.
    Aken, Bronwen
    Ayling, Sarah
    Baertsch, Robert
    Fernandez-Banet, Julio
    Cherry, Joshua L.
    Curwen, Val
    DiCuccio, Michael
    Kellis, Manolis
    Lee, Jennifer
    Lin, Michael F.
    Schuster, Michael
    Shkeda, Andrew
    Amid, Clara
    Brown, Garth
    Dukhanina, Oksana
    Frankish, Adam
    Hart, Jennifer
    Maidak, Bonnie L.
    Mudge, Jonathan
    Murphy, Michael R.
    Murphy, Terence
    Rajan, Jeena
    Rajput, Bhanu
    Riddick, Lillian D.
    Snow, Catherine
    Steward, Charles
    Webb, David
    Weber, Janet A.
    Wilming, Laurens
    Wu, Wenyu
    Birney, Ewan
    Haussler, David
    Hubbard, Tim
    Ostell, James
    Durbin, Richard
    Lipman, David
    [J]. GENOME RESEARCH, 2009, 19 (07) : 1316 - 1323