Exploiting PubMed for Protein Molecular Function Prediction via NMF based Multi-Label Classification

被引:6
作者
Fodeh, Samah [1 ]
Tiwari, Aditya [2 ]
Yu, Hong [3 ]
机构
[1] Yale Univ, Yale Sch Med, New Haven, CT 06520 USA
[2] Univ Massachussettes, Amherst, MA USA
[3] Univ Massachussettes, Sch Med, Worcester, MA USA
来源
2017 17TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2017) | 2017年
关键词
Gene molecular function; classification; NMF; annotation; GO; KNN; AUTOMATIC EXTRACTION; ANNOTATION; TEXT;
D O I
10.1109/ICDMW.2017.64
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Gene ontology (GO) defines terms and classes used to describe gene functions and relationships between them. GO has been the standard to describing the functions of specific genes in different model organisms. GO annotation which tags genes with GO terms has mostly been a manual and time-consuming curation process. In this paper we describe the development and evaluation of an innovative predictive system to automatically assign a gene its molecular functions (GO terms) using biomedical literature as a resource. We treated a GO term assignment as a multi-label multi-class classification problem. Rather than the commonly used bag-of-words approach, we used non-negative matrix factorization (NMF) for feature reduction and then performed the classification of genes. To address the multi-label aspect of the data, we used the binary-relevance method. We experimented with different classifiers and found that the combination of binary relevance and K-nearest neighbor (KNN) classifier gave the best performance. Our evaluation on UniProtKB/Swiss-Prot dataset showed the best performance of .83 in terms of F-measure.
引用
收藏
页码:446 / 451
页数:6
相关论文
共 27 条
[1]   Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families [J].
Andrade, MA ;
Valencia, A .
BIOINFORMATICS, 1998, 14 (07) :600-607
[2]  
[Anonymous], BIOINFORMATICS
[3]  
[Anonymous], CELLULAR MOL LIFE SC
[4]  
[Anonymous], P 7 PAC S BIOC
[5]  
[Anonymous], NUCL ACIDS RES
[6]  
[Anonymous], P ACL 2003 WORKSH NA
[7]  
[Anonymous], ISMB
[8]  
[Anonymous], TRENDS BIOTECHNOLOGY
[9]  
[Anonymous], BRIEFING BIOINFORM B
[10]  
[Anonymous], NUCL ACIDS RES