Exploiting PubMed for Protein Molecular Function Prediction via NMF based Multi-Label Classification

被引:6
|
作者
Fodeh, Samah [1 ]
Tiwari, Aditya [2 ]
Yu, Hong [3 ]
机构
[1] Yale Univ, Yale Sch Med, New Haven, CT 06520 USA
[2] Univ Massachussettes, Amherst, MA USA
[3] Univ Massachussettes, Sch Med, Worcester, MA USA
关键词
Gene molecular function; classification; NMF; annotation; GO; KNN; AUTOMATIC EXTRACTION; ANNOTATION; TEXT;
D O I
10.1109/ICDMW.2017.64
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Gene ontology (GO) defines terms and classes used to describe gene functions and relationships between them. GO has been the standard to describing the functions of specific genes in different model organisms. GO annotation which tags genes with GO terms has mostly been a manual and time-consuming curation process. In this paper we describe the development and evaluation of an innovative predictive system to automatically assign a gene its molecular functions (GO terms) using biomedical literature as a resource. We treated a GO term assignment as a multi-label multi-class classification problem. Rather than the commonly used bag-of-words approach, we used non-negative matrix factorization (NMF) for feature reduction and then performed the classification of genes. To address the multi-label aspect of the data, we used the binary-relevance method. We experimented with different classifiers and found that the combination of binary relevance and K-nearest neighbor (KNN) classifier gave the best performance. Our evaluation on UniProtKB/Swiss-Prot dataset showed the best performance of .83 in terms of F-measure.
引用
收藏
页码:446 / 451
页数:6
相关论文
共 50 条
  • [1] Exploiting MEDLINE for gene molecular function prediction via NMF based multi-label classification
    Fodeh, Samah Jamal
    Tiwari, Aditya
    JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 86 : 160 - 166
  • [2] Cluster Tree based Multi-Label Classification for Protein Function Prediction
    Wu, Qingyao
    Ye, Yunming
    Zhang, Xiaofeng
    Ho, Shen-Shyang
    2013 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2013,
  • [3] NMF-based Label Space Factorization for Multi-label Classification
    Firouzi, Mohammad
    Karimian, Mahmood
    Baghshah, Mahdieh Soleymani
    2017 16TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2017, : 297 - 303
  • [4] Reduction strategies for hierarchical multi-label classification in protein function prediction
    Ricardo Cerri
    Rodrigo C. Barros
    André C. P. L. F. de Carvalho
    Yaochu Jin
    BMC Bioinformatics, 17
  • [5] Multi-label classification by exploiting label correlations
    Yu, Ying
    Pedrycz, Witold
    Miao, Duoqian
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (06) : 2989 - 3004
  • [6] Reduction strategies for hierarchical multi-label classification in protein function prediction
    Cerri, Ricardo
    Barros, Rodrigo C.
    de Carvalho, Andre C. P. L. F.
    Jin, Yaochu
    BMC BIOINFORMATICS, 2016, 17
  • [7] A Label Embedding Method for Multi-label Classification via Exploiting Local Label Correlations
    Wang, Xidong
    Li, Jun
    Xu, Jianhua
    NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 168 - 180
  • [8] Air pollution prediction via multi-label classification
    Corani, Giorgio
    Scanagatta, Mauro
    ENVIRONMENTAL MODELLING & SOFTWARE, 2016, 80 : 259 - 264
  • [9] A hierarchical multi-label classification ant colony algorithm for protein function prediction
    Otero F.E.B.
    Freitas A.A.
    Johnson C.G.
    Memetic Computing, 2010, 2 (3) : 165 - 181
  • [10] Multi-label Feature Selection Techniques for Hierarchical Multi-label Protein Function Prediction
    Cerri, Ricardo
    Mantovani, Rafael G.
    Basgalupp, Marcio P.
    de Carvalho, Andre C. P. L. F.
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,