k-Information Gain Scaled Nearest Neighbors: A Novel Approach to Classifying Protein-Protein Interaction-Related Documents

被引:17
作者
Ambert, Kyle H. [1 ]
Cohen, Aaron M. [1 ]
机构
[1] Oregon Hlth & Sci Univ, Dept Med Informat & Clin Epidemiol, Portland, OR 97239 USA
关键词
Protein-protein interaction; k-nearest neighbor; information gain; support vector machine; text classification; MACHINE; BIND;
D O I
10.1109/TCBB.2011.32
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Although publicly accessible databases containing protein-protein interaction (PPI)-related information are important resources to bench and in silico research scientists alike, the amount of time and effort required to keep them up to date is often burdonsome. In an effort to help identify relevant PPI publications, text-mining tools, from the machine learning discipline, can be applied to help in this process. Here, we describe and evaluate two document classification algorithms that we submitted to the BioCreative II.5 PPI Classification Challenge Task. This task asked participants to design classifiers for identifying documents containing PPI-related information in the primary literature, and evaluated them against one another. One of our systems was the overall best-performing system submitted to the challenge task. It utilizes a novel approach to k-nearest neighbor classification, which we describe here, and compare its performance to those of two support vector machine-based classification systems, one of which was also evaluated in the challenge task.
引用
收藏
页码:305 / 310
页数:6
相关论文
共 27 条
[1]   A System for Classifying Disease Comorbidity Status from Medical Discharge Summaries Using Automated Hotspot and Negated Concept Detection [J].
Ambert, Kyle H. ;
Cohen, Aaron M. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2009, 16 (04) :590-595
[2]  
[Anonymous], 2000, KDD WORKSH TEXT MIN
[3]  
[Anonymous], 1993, Proceedings of the 13th International Joint Conference on Artificial Intelligence
[4]   BIND - a data specification for storing and describing biomolecular interactions, molecular complexes and pathways [J].
Bader, GD ;
Hogue, CWV .
BIOINFORMATICS, 2000, 16 (05) :465-477
[5]   BIND - The Biomolecular Interaction Network Database [J].
Bader, GD ;
Donaldson, I ;
Wolting, C ;
Ouellette, BFF ;
Pawson, T ;
Hogue, CWV .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :242-245
[6]  
Chang C-J Lin C.-C., LIBSVM: A Library for Support Vector Machines
[7]  
Cohen Aaron M, 2006, AMIA Annu Symp Proc, P161
[8]   Cross-Topic Learning for Work Prioritization in Systematic Review Creation and Update [J].
Cohen, Aaron M. ;
Ambert, Kyle ;
McDonach, Marian .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2009, 16 (05) :690-704
[9]   A survey of current work in biomedical text mining [J].
Cohen, AM ;
Hersh, WR .
BRIEFINGS IN BIOINFORMATICS, 2005, 6 (01) :57-71
[10]   PreBIND and Textomy - mining the biomedical literature for protein-protein interactions using a support vector machine [J].
Donaldson, I ;
Martin, J ;
de Bruijn, B ;
Wolting, C ;
Lay, V ;
Tuekam, B ;
Zhang, SD ;
Baskin, B ;
Bader, GD ;
Michalickova, K ;
Pawson, T ;
Hogue, CWV .
BMC BIOINFORMATICS, 2003, 4 (1)