Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis

被引:125
作者
Blohm, Philipp [1 ,2 ]
Frishman, Goar [1 ]
Smialowski, Pawel [1 ,3 ]
Goebels, Florian [3 ]
Wachinger, Benedikt [1 ,2 ]
Ruepp, Andreas [1 ]
Frishman, Dmitrij [1 ,3 ]
机构
[1] HMGU German Res Ctr Environm Hlth, Inst Bioinformat & Syst Biol MIPS, D-85764 Neuherberg, Germany
[2] Clueda AG, D-80687 Munich, Germany
[3] Tech Univ Munich, Dept Genome Oriented Bioinformat, D-85350 Freising Weihenstephan, Germany
关键词
EXTRACTION; NEGATION; DOMAIN; PDB;
D O I
10.1093/nar/gkt1079
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Knowledge about non-interacting proteins (NIPs) is important for training the algorithms to predict protein-protein interactions (PPIs) and for assessing the false positive rates of PPI detection efforts. We present the second version of Negatome, a database of proteins and protein domains that are unlikely to engage in physical interactions (available online at http://mips.helmholtz-muenchen.de/proj/ppi/negatome). Negatome is derived by manual curation of literature and by analyzing three-dimensional structures of protein complexes. The main methodological innovation in Negatome 2.0 is the utilization of an advanced text mining procedure to guide the manual annotation process. Potential non-interactions were identified by a modified version of Excerbt, a text mining tool based on semantic sentence analysis. Manual verification shows that nearly a half of the text mining results with the highest confidence values correspond to NIP pairs. Compared to the first version the contents of the database have grown by over 300%.
引用
收藏
页码:D396 / D400
页数:5
相关论文
共 27 条
[1]  
Acland A, 2013, NUCLEIC ACIDS RES, V41, pD8, DOI [10.1093/nar/gkx1095, 10.1093/nar/gks1189, 10.1093/nar/gkq1172]
[2]   Biomedical negation scope detection with conditional random fields [J].
Agarwal, Shashank ;
Yu, Hong .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2010, 17 (06) :696-701
[3]   Large Scale Application of Neural Network Based Semantic Role Labeling for Automated Relation Extraction from Biomedical Texts [J].
Barnickel, Thorsten ;
Weston, Jason ;
Collobert, Ronan ;
Mewes, Hans-Werner ;
Stuempflen, Volker .
PLOS ONE, 2009, 4 (07)
[4]   Choosing negative examples for the prediction of protein-protein interactions [J].
Ben-Hur, A ;
Noble, WS .
BMC BIOINFORMATICS, 2006, 7 (Suppl 1)
[5]  
Bjorne J., 2010, Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, BioNLP '10, P28
[6]  
Collobert R, 2011, J MACH LEARN RES, V12, P2493
[7]   DADA: Degree-Aware Algorithms for Network-Based Disease Gene Prioritization [J].
Erten, Sinan ;
Bebek, Gurkan ;
Ewing, Rob M. ;
Koyutuerk, Mehmet .
BIODATA MINING, 2011, 4
[8]   iPfam:: visualization of protein-protein interactions in PDB at domain and amino acid resolutions [J].
Finn, RD ;
Marshall, M ;
Bateman, A .
BIOINFORMATICS, 2005, 21 (03) :410-412
[9]   BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events [J].
Gerner, Martin ;
Sarafraz, Farzaneh ;
Bergman, Casey M. ;
Nenadic, Goran .
BIOINFORMATICS, 2012, 28 (16) :2154-2161
[10]   A computational framework for boosting confidence in high-throughput protein-protein interaction datasets [J].
Hosur, Raghavendra ;
Peng, Jian ;
Vinayagam, Arunachalam ;
Stelzl, Ulrich ;
Xu, Jinbo ;
Perrimon, Norbert ;
Bienkowska, Jadwiga ;
Berger, Bonnie .
GENOME BIOLOGY, 2012, 13 (08) :R76