Effects of the hierarchy in hierarchical, multi-label classification

被引:6
作者
Daisey, Katie [1 ,2 ]
Brown, Steven D. [1 ]
机构
[1] Univ Delaware, Dept Chem & Biochem, 163 Green, Newark, DE 19716 USA
[2] Arkema Inc, 900 First Ave, King Of Prussia, PA 19406 USA
基金
美国国家科学基金会;
关键词
Hierarchical classification; Multi-label classification; Machine learning; Hierarchical multi-label models; UNCERTAINTY; TREE;
D O I
10.1016/j.chemolab.2020.104177
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The consequences of the choice of a hierarchy in hierarchical multi-label classification (HMLC) have previously not been considered in any detail. Three hierarchy-related factors in HMLC are examined here: hierarchy structure, class location in the hierarchy, and class distribution in feature space. Four general model groups are found to exist in HMLC modeling: "non-informative", "semi-informative", "comparable", and "hierarchical". Studies of synthetic and real data show that the choice of hierarchy used in the modeling is important in setting the relative error rates of false positives and false negatives. The choice of hierarchy depends upon the relative consequences of false positive and false negative errors produced by the resulting model. A low false negative error rate results from use of a "comparable" HMLC model with a hierarchy designed to maximize intergroup separation. A low false positive error rate results from use of a "hierarchical" HMLC model using any hierarchy. Modest differences in accuracy and F1 measure occur between the best-performing HMLC models built on several external and internal hierarchies for a complex, multiclass dataset. HMLC methods using "comparable" and "hierarchical" HMLC models and phylogenetic hierarchies examined slightly outperform a conventional classification using the same classifier on the Dalbergia data. Brief: Studies of synthetic and real data show that the structure of a multi-label hierarchy in hierarchical, multi label classification (HMLC) is important in setting the relative error rate of false positives and false negatives. A low false negative error rate results from use of a "comparable" HMLC model with a hierarchy designed to maximize inter-class separation. A low false positive error rate results from use of a "hierarchical" HMLC model with any hierarchy.
引用
收藏
页数:13
相关论文
共 34 条
[1]  
Aiolli F, 2005, J MACH LEARN RES, V6, P817
[2]  
[Anonymous], 2018, Applied predictive modeling
[3]  
[Anonymous], 2009, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, DOI DOI 10.1007/978
[4]  
[Anonymous], **DATA OBJECT**, DOI DOI 10.5066/P9SIYCHJ
[5]   A hierarchical discriminant analysis for species identification in raw meat by visible and near infrared spectroscopy [J].
Arnalds, T ;
McElhinney, J ;
Fearn, T ;
Downey, G .
JOURNAL OF NEAR INFRARED SPECTROSCOPY, 2004, 12 (03) :183-188
[6]   On the stability of hierarchical classification: Qualitative approaches [J].
Barthelemy, Jean-Pierre ;
Gusho, Gentian .
MATHEMATICAL AND COMPUTER MODELLING, 2009, 50 (3-4) :329-332
[7]   Hierarchical multi-label prediction of gene function [J].
Barutcuoglu, Z ;
Schapire, RE ;
Troyanskaya, OG .
BIOINFORMATICS, 2006, 22 (07) :830-836
[8]  
Bertoluzza C, 2004, ADV SOFT COMP, P455
[9]   An evaluation of global-model hierarchical classification algorithms for hierarchical classification problems with single path of labels [J].
Borges, Helyane Bronoski ;
Silla, Carlos N., Jr. ;
Nievola, Julio Cesar .
COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2013, 66 (10) :1991-2002
[10]  
Breiman L., 2001, IEEE Trans. Broadcast., V45, P5