Leveraging class hierarchy for detecting missing annotations on hierarchical multi-label classification

被引:4
作者
Romero, Miguel [1 ]
Nakano, Felipe Kenji [2 ,3 ]
Finke, Jorge [1 ]
Rocha, Camilo [1 ]
Vens, Celine [2 ,3 ]
机构
[1] Pontificia Univ Javeriana, Dept Elect & Comp Sci, Calle 18 N 118-250, Cali 760031, Colombia
[2] KU Leuven Campus KULAK, Dept Publ Hlth & Primary Care, Etienne Sabbelaan 53, B-8500 Kortrijk, Belgium
[3] Katholieke Univ Leuven, Itec, imec Res Grp, Etienne Sabbelaan 53, B-8500 Kortrijk, Belgium
关键词
Detecting missing annotations; Hierarchical multi -label classification; Structured output prediction; Gene function prediction; Gene ontology hierarchy; Random forest; Tree ensembles; GENE; INFORMATION; PREDICTION; ONTOLOGY; DATABASE;
D O I
10.1016/j.compbiomed.2022.106423
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
With the development of new sequencing technologies, availability of genomic data has grown exponentially. Over the past decade, numerous studies have used genomic data to identify associations between genes and biological functions. While these studies have shown success in annotating genes with functions, they often assume that genes are completely annotated and fail to take into account that datasets are sparse and noisy. This work proposes a method to detect missing annotations in the context of hierarchical multi-label classification. More precisely, our method exploits the relations of functions, represented as a hierarchy, by computing probabilities based on the paths of functions in the hierarchy. By performing several experiments on a variety of rice (Oriza sativa Japonica), we showcase that the proposed method accurately detects missing annotations and yields superior results when compared to state-of-art methods from the literature.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Abu-El-Haija S., 2018, N-GCN: Multi-Scale Graph Convolution for Semi-Supervised Node Classification
  • [2] Approaches for extracting practical information from gene co-expression networks in plant biology
    Aoki, Koh
    Ogata, Yoshiyuki
    Shibata, Daisuke
    [J]. PLANT AND CELL PHYSIOLOGY, 2007, 48 (03) : 381 - 390
  • [3] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [4] A noise-detection based AdaBoost algorithm for mislabeled data
    Cao, Jingjing
    Kwong, Sam
    Wang, Ran
    [J]. PATTERN RECOGNITION, 2012, 45 (12) : 4451 - 4465
  • [5] The Gene Ontology Resource: 20 years and still GOing strong
    Carbon, S.
    Douglass, E.
    Dunn, N.
    Good, B.
    Harris, N. L.
    Lewis, S. E.
    Mungall, C. J.
    Basu, S.
    Chisholm, R. L.
    Dodson, R. J.
    Hartline, E.
    Fey, P.
    Thomas, P. D.
    Albou, L. P.
    Ebert, D.
    Kesling, M. J.
    Mi, H.
    Muruganujian, A.
    Huang, X.
    Poudel, S.
    Mushayahama, T.
    Hu, J. C.
    LaBonte, S. A.
    Siegele, D. A.
    Antonazzo, G.
    Attrill, H.
    Brown, N. H.
    Fexova, S.
    Garapati, P.
    Jones, T. E. M.
    Marygold, S. J.
    Millburn, G. H.
    Rey, A. J.
    Trovisco, V.
    dos Santos, G.
    Emmert, D. B.
    Falls, K.
    Zhou, P.
    Goodman, J. L.
    Strelets, V. B.
    Thurmond, J.
    Courtot, M.
    Osumi-Sutherland, D.
    Parkinson, H.
    Roncaglia, P.
    Acencio, M. L.
    Kuiper, M.
    Laegreid, A.
    Logie, C.
    Lovering, R. C.
    [J]. NUCLEIC ACIDS RESEARCH, 2019, 47 (D1) : D330 - D338
  • [6] Network-based methods for gene function prediction
    Chen, Qingfeng
    Li, Yongjie
    Tan, Kai
    Qiao, Yvlu
    Pan, Shirui
    Jiang, Taijiao
    Chen, Yi-Ping Phoebe
    [J]. BRIEFINGS IN FUNCTIONAL GENOMICS, 2021, 20 (04) : 249 - 257
  • [7] Global and local attention-based multi-label learning with missing labels
    Cheng, Yusheng
    Qian, Kun
    Min, Fan
    [J]. INFORMATION SCIENCES, 2022, 594 : 20 - 42
  • [8] Gene Coexpression Network Analysis as a Source of Functional Annotation for Rice Genes
    Childs, Kevin L.
    Davidson, Rebecca M.
    Buell, C. Robin
    [J]. PLOS ONE, 2011, 6 (07):
  • [9] Compact Integration of Multi-Network Topology for Functional Analysis of Genes
    Cho, Hyunghoon
    Berger, Bonnie
    Peng, Jian
    [J]. CELL SYSTEMS, 2016, 3 (06) : 540 - +
  • [10] Using single-plant-omics in the field to link maize genes to functions and phenotypes
    Cruz, Daniel Felipe
    De Meyer, Sam
    Ampe, Joke
    Sprenger, Heike
    Herman, Dorota
    Van Hautegem, Tom
    De Block, Jolien
    Inze, Dirk
    Nelissen, Hilde
    Maere, Steven
    [J]. MOLECULAR SYSTEMS BIOLOGY, 2020, 16 (12)