Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis

被引:114
作者
Blohm, Philipp [1 ,2 ]
Frishman, Goar [1 ]
Smialowski, Pawel [1 ,3 ]
Goebels, Florian [3 ]
Wachinger, Benedikt [1 ,2 ]
Ruepp, Andreas [1 ]
Frishman, Dmitrij [1 ,3 ]
机构
[1] HMGU German Res Ctr Environm Hlth, Inst Bioinformat & Syst Biol MIPS, D-85764 Neuherberg, Germany
[2] Clueda AG, D-80687 Munich, Germany
[3] Tech Univ Munich, Dept Genome Oriented Bioinformat, D-85350 Freising Weihenstephan, Germany
关键词
EXTRACTION; NEGATION; DOMAIN; PDB;
D O I
10.1093/nar/gkt1079
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Knowledge about non-interacting proteins (NIPs) is important for training the algorithms to predict protein-protein interactions (PPIs) and for assessing the false positive rates of PPI detection efforts. We present the second version of Negatome, a database of proteins and protein domains that are unlikely to engage in physical interactions (available online at http://mips.helmholtz-muenchen.de/proj/ppi/negatome). Negatome is derived by manual curation of literature and by analyzing three-dimensional structures of protein complexes. The main methodological innovation in Negatome 2.0 is the utilization of an advanced text mining procedure to guide the manual annotation process. Potential non-interactions were identified by a modified version of Excerbt, a text mining tool based on semantic sentence analysis. Manual verification shows that nearly a half of the text mining results with the highest confidence values correspond to NIP pairs. Compared to the first version the contents of the database have grown by over 300%.
引用
收藏
页码:D396 / D400
页数:5
相关论文
共 27 条
  • [1] Acland A, 2013, NUCLEIC ACIDS RES, V41, pD8, DOI [10.1093/nar/gkx1095, 10.1093/nar/gks1189, 10.1093/nar/gkq1172]
  • [2] Biomedical negation scope detection with conditional random fields
    Agarwal, Shashank
    Yu, Hong
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (06) : 696 - 701
  • [3] Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts
    Barnickel, Thorsten
    Weston, Jason
    Collobert, Ronan
    Mewes, Hans-Werner
    Stuempflen, Volker
    [J]. PLOS ONE, 2009, 4 (07):
  • [4] Choosing negative examples for the prediction of protein-protein interactions
    Ben-Hur, A
    Noble, WS
    [J]. BMC BIOINFORMATICS, 2006, 7 (Suppl 1)
  • [5] Bjorne J., 2010, Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, BioNLP '10, P28
  • [6] Collobert R, 2011, J MACH LEARN RES, V12, P2493
  • [7] DADA: Degree-Aware Algorithms for Network-Based Disease Gene Prioritization
    Erten, Sinan
    Bebek, Gurkan
    Ewing, Rob M.
    Koyutuerk, Mehmet
    [J]. BIODATA MINING, 2011, 4
  • [8] iPfam:: visualization of protein-protein interactions in PDB at domain and amino acid resolutions
    Finn, RD
    Marshall, M
    Bateman, A
    [J]. BIOINFORMATICS, 2005, 21 (03) : 410 - 412
  • [9] BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events
    Gerner, Martin
    Sarafraz, Farzaneh
    Bergman, Casey M.
    Nenadic, Goran
    [J]. BIOINFORMATICS, 2012, 28 (16) : 2154 - 2161
  • [10] A computational framework for boosting confidence in high-throughput protein-protein interaction datasets
    Hosur, Raghavendra
    Peng, Jian
    Vinayagam, Arunachalam
    Stelzl, Ulrich
    Xu, Jinbo
    Perrimon, Norbert
    Bienkowska, Jadwiga
    Berger, Bonnie
    [J]. GENOME BIOLOGY, 2012, 13 (08): : R76