Exploiting PubMed for Protein Molecular Function Prediction via NMF based Multi-Label Classification

被引:6
|
作者
Fodeh, Samah [1 ]
Tiwari, Aditya [2 ]
Yu, Hong [3 ]
机构
[1] Yale Univ, Yale Sch Med, New Haven, CT 06520 USA
[2] Univ Massachussettes, Amherst, MA USA
[3] Univ Massachussettes, Sch Med, Worcester, MA USA
关键词
Gene molecular function; classification; NMF; annotation; GO; KNN; AUTOMATIC EXTRACTION; ANNOTATION; TEXT;
D O I
10.1109/ICDMW.2017.64
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Gene ontology (GO) defines terms and classes used to describe gene functions and relationships between them. GO has been the standard to describing the functions of specific genes in different model organisms. GO annotation which tags genes with GO terms has mostly been a manual and time-consuming curation process. In this paper we describe the development and evaluation of an innovative predictive system to automatically assign a gene its molecular functions (GO terms) using biomedical literature as a resource. We treated a GO term assignment as a multi-label multi-class classification problem. Rather than the commonly used bag-of-words approach, we used non-negative matrix factorization (NMF) for feature reduction and then performed the classification of genes. To address the multi-label aspect of the data, we used the binary-relevance method. We experimented with different classifiers and found that the combination of binary relevance and K-nearest neighbor (KNN) classifier gave the best performance. Our evaluation on UniProtKB/Swiss-Prot dataset showed the best performance of .83 in terms of F-measure.
引用
收藏
页码:446 / 451
页数:6
相关论文
共 50 条
  • [21] NMF-Based Stochastic Models for Multi-Label Propagation
    Sun, Liang
    Ge, Hongwei
    Weiting Sun
    2017 13TH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (ICNC-FSKD), 2017, : 1141 - 1147
  • [22] A Deep Neural Network Based Hierarchical Multi-Label Classifier for Protein Function Prediction
    Yuan, Xin
    Li, Weite
    Lin, Kui
    Hu, Jinglu
    PROCEEDING OF THE 2019 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (IEEE CITS 2019), 2019, : 131 - 135
  • [23] Interval prediction for graded multi-label classification
    Lastra, Gerardo
    Luaces, Oscar
    Bahamonde, Antonio
    PATTERN RECOGNITION LETTERS, 2014, 49 : 171 - 176
  • [24] Extreme Multi-Label Text Classification Based on Balance Function
    Chen, Zhaohong
    Hong, Zhiyong
    Yu, Wenhua
    Zhang, Xin
    Computer Engineering and Applications, 2024, 60 (04) : 163 - 172
  • [25] Partial Multi-Label Learning via Exploiting Instance and Label Correlations
    Liang, Weichao
    Gao, Guangliang
    Chen, Lei
    Wang, Youquan
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 19 (01)
  • [26] Online Multi-Instance Multi-Label Learning for Protein Function Prediction
    Wu, Feng
    Liu, Qiong
    Hao, Tianyong
    Chen, Xiaojun
    Wu, Qingyao
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 780 - 785
  • [27] SPL: EXPLOITING UNLABELED DATA FOR MULTI-LABEL IMAGE CLASSIFICATION
    Zhang, Weibo
    Zhu, Fuqing
    Dai, Jiao
    Hu, Songlin
    Han, Jizhong
    Guo, Tao
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 157 - 162
  • [28] Multi-label classification by exploiting local positive and negative pairwise label correlation
    Huang, Jun
    Li, Guorong
    Wang, Shuhui
    Xue, Zhe
    Huang, Qingming
    NEUROCOMPUTING, 2017, 257 : 164 - 174
  • [29] Multi-label classification and label dependence in in silico toxicity prediction
    Yap, Xiu Huan
    Raymer, Michael
    TOXICOLOGY IN VITRO, 2021, 74
  • [30] Protein Function Prediction Using Multi-label Learning and ISOMAP Embedding
    Liang, Huadong
    Sun, Dengdi
    Ding, Zhuanlian
    Ge, Meiling
    BIO-INSPIRED COMPUTING - THEORIES AND APPLICATIONS, BIC-TA 2015, 2015, 562 : 249 - 259