Exploiting PubMed for Protein Molecular Function Prediction via NMF based Multi-Label Classification

被引:6
|
作者
Fodeh, Samah [1 ]
Tiwari, Aditya [2 ]
Yu, Hong [3 ]
机构
[1] Yale Univ, Yale Sch Med, New Haven, CT 06520 USA
[2] Univ Massachussettes, Amherst, MA USA
[3] Univ Massachussettes, Sch Med, Worcester, MA USA
关键词
Gene molecular function; classification; NMF; annotation; GO; KNN; AUTOMATIC EXTRACTION; ANNOTATION; TEXT;
D O I
10.1109/ICDMW.2017.64
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Gene ontology (GO) defines terms and classes used to describe gene functions and relationships between them. GO has been the standard to describing the functions of specific genes in different model organisms. GO annotation which tags genes with GO terms has mostly been a manual and time-consuming curation process. In this paper we describe the development and evaluation of an innovative predictive system to automatically assign a gene its molecular functions (GO terms) using biomedical literature as a resource. We treated a GO term assignment as a multi-label multi-class classification problem. Rather than the commonly used bag-of-words approach, we used non-negative matrix factorization (NMF) for feature reduction and then performed the classification of genes. To address the multi-label aspect of the data, we used the binary-relevance method. We experimented with different classifiers and found that the combination of binary relevance and K-nearest neighbor (KNN) classifier gave the best performance. Our evaluation on UniProtKB/Swiss-Prot dataset showed the best performance of .83 in terms of F-measure.
引用
收藏
页码:446 / 451
页数:6
相关论文
共 50 条
  • [31] Clinical Multi-label Free Text Classification by Exploiting Disease Label Relation
    Zhao, Rui-Wei
    Li, Guo-Zheng
    Liu, Jia-Ming
    Wang, Xiao
    2013 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2013,
  • [32] Multi-Label Classification Based on Associations
    Alazaidah, Raed
    Samara, Ghassan
    Almatarneh, Sattam
    Hassan, Mohammad
    Aljaidi, Mohammad
    Mansur, Hasan
    APPLIED SCIENCES-BASEL, 2023, 13 (08):
  • [33] Poster Abstract: Improving Occupancy Presence Prediction via Multi-Label Classification
    Imamovic, Kenan
    Sangogboye, Fisayo Caleb
    Kjaergaard, Mikkel Baun
    BUILDSYS'15 PROCEEDINGS OF THE 2ND ACM INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS FOR ENERGY-EFFICIENT BUILT, 2015, : 113 - 114
  • [34] Hierarchical multi-label classification with SVMs: A case study in gene function prediction
    Vateekul, Peerapon
    Kubat, Miroslav
    Sarinnapakorn, Kanoksri
    INTELLIGENT DATA ANALYSIS, 2014, 18 (04) : 717 - 738
  • [35] Link Prediction-based Multi-label Classification on Networked Data
    Zhao, Yinfeng
    Li, Lei
    Wu, Xindong
    2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC 2016), 2016, : 61 - 68
  • [36] Label Embedding for Multi-label Classification Via Dependence Maximization
    Yachong Li
    Youlong Yang
    Neural Processing Letters, 2020, 52 : 1651 - 1674
  • [37] Multi-label Classification via Label-Topic Pairs
    Chen, Gang
    Peng, Yue
    Wang, Chongjun
    WEB AND BIG DATA (APWEB-WAIM 2018), PT I, 2018, 10987 : 32 - 44
  • [38] Multi-label relational classification via node and label correlation
    Zhang, Zan
    Wang, Hao
    Liu, Lin
    Li, Jiuyong
    NEUROCOMPUTING, 2018, 292 : 72 - 81
  • [39] Hierarchical multi-label prediction of gene function
    Barutcuoglu, Z
    Schapire, RE
    Troyanskaya, OG
    BIOINFORMATICS, 2006, 22 (07) : 830 - 836
  • [40] Label Embedding for Multi-label Classification Via Dependence Maximization
    Li, Yachong
    Yang, Youlong
    NEURAL PROCESSING LETTERS, 2020, 52 (02) : 1651 - 1674