Improving large-scale hierarchical classification by rewiring: a data-driven filter based approach

被引:0
|
作者
Azad Naik
Huzefa Rangwala
机构
[1] Microsoft Corporation,
[2] George Mason University,undefined
关键词
Top-down hierarchical classification; Inconsistency; Error propagation; Flattening; Clustering; Rewiring;
D O I
暂无
中图分类号
学科分类号
摘要
Hierarchical Classification (HC) is a supervised learning problem where unlabeled instances are classified into a taxonomy of classes. Several methods that utilize the hierarchical structure have been developed to improve the HC performance. However, in most cases apriori defined hierarchical structure by domain experts is inconsistent; as a consequence performance improvement is not noticeable in comparison to flat classification methods. We propose a scalable data-driven filter based rewiring approach to modify an expert-defined hierarchy. Experimental comparisons of top-down hierarchical classification with our modified hierarchy, on a wide range of datasets shows classification performance improvement over the baseline hierarchy (i.e., defined by expert), clustered hierarchy and flattening based hierarchy modification approaches. In comparison to existing rewiring approaches, our developed method (rewHier) is computationally efficient, enabling it to scale to datasets with large numbers of classes, instances and features. We also show that our modified hierarchy leads to improved classification performance for classes with few training samples in comparison to flat and state-of-the-art hierarchical classification approaches. Source Code: https://cs.gmu.edu/~mlbio/TaxMod/
引用
收藏
页码:141 / 164
页数:23
相关论文
共 50 条
  • [41] mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics
    Mirarchi, Antonio
    Giorgino, Toni
    De Fabritiis, Gianni
    SCIENTIFIC DATA, 2024, 11 (01)
  • [42] Data-Driven Crowd Understanding: A Baseline for a Large-Scale Crowd Dataset
    Zhang, Cong
    Kang, Kai
    Li, Hongsheng
    Wang, Xiaogang
    Xie, Rong
    Yang, Xiaokang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (06) : 1048 - 1061
  • [43] Introduction to the special issue on data-driven and large-scale distributed simulations
    Cai, W.
    Aydt, H.
    JOURNAL OF SIMULATION, 2017, 11 (03) : 193 - 193
  • [44] Evaluation of large-scale cycling environment by using the trajectory data of dockless shared bicycles: A data-driven approach
    Ni, Ying
    Wang, Shihan
    Chen, Jiaqi
    Feng, Bufan
    Yu, Rongjie
    Cai, Yilin
    IET INTELLIGENT TRANSPORT SYSTEMS, 2024, 18 (10) : 1943 - 1961
  • [45] A data-driven layout optimization framework of large-scale wind farms based on machine learning
    Yang, Kun
    Deng, Xiaowei
    Ti, Zilong
    Yang, Shanghui
    Huang, Senbin
    Wang, Yuhang
    RENEWABLE ENERGY, 2023, 218
  • [46] Data-driven modelling of energy demand response behaviour based on a large-scale residential trial
    Antonopoulos, Ioannis
    Robu, Valentin
    Couraud, Benoit
    Flynn, David
    ENERGY AND AI, 2021, 4
  • [47] A data-driven distributed fault diagnosis scheme for large-scale systems based on correlation analysis
    Li, Zhennan
    Li, Linlin
    Ding, Steven X.
    IET CONTROL THEORY AND APPLICATIONS, 2024, 18 (02): : 201 - 212
  • [48] Data-driven causality digraph modeling of large-scale complex system based on transfer entropy
    Faghraoui, Ahmed
    Kabadi, Mohamed Ghassane
    Sauter, Dominique
    Boukhobza, Taha
    Aubrun, Christophe
    2014 IEEE CONFERENCE ON CONTROL APPLICATIONS (CCA), 2014, : 705 - 710
  • [49] A logical approach to data-driven classification
    Osswald, R
    Petersen, W
    KI 2003: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2003, 2821 : 267 - 281
  • [50] Large-scale industrial energy systems optimization under uncertainty: A data-driven robust optimization approach
    Shen, Feifei
    Zhao, Liang
    Du, Wenli
    Zhong, Weimin
    Qian, Feng
    APPLIED ENERGY, 2020, 259 (259)