An extensive experimental comparison of methods for multi-label learning

被引：536

作者：

Madjarov, Gjorgji ^{[1
,2
]}

Kocev, Dragi ^{[2
]}

Gjorgjevikj, Dejan ^{[1
]}

Dzeroski, Saso ^{[2
]}

机构：

[1] Ss Cyril & Methodius Univ, Fac Comp Sci & Engn, Skopje 1000, North Macedonia

[2] Jozef Stefan Inst, Dept Knowledge Technol, Ljubljana 1000, Slovenia

来源：

PATTERN RECOGNITION | 2012年 / 45卷 / 09期

关键词：

Multi-label ranking; Multi-label classification; Comparison of multi-label learning methods; CLASSIFICATION; ALGORITHMS;

D O I：

10.1016/j.patcog.2012.03.004

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-label learning has received significant attention in the research community over the past few years: this has resulted in the development of a variety of multi-label learning methods. In this paper, we present an extensive experimental comparison of 12 multi-label learning methods using 16 evaluation measures over 11 benchmark datasets. We selected the competing methods based on their previous usage by the community, the representation of different groups of methods and the variety of basic underlying machine learning methods. Similarly, we selected the evaluation measures to be able to assess the behavior of the methods from a variety of view-points. In order to make conclusions independent from the application domain, we use 11 datasets from different domains. Furthermore, we compare the methods by their efficiency in terms of time needed to learn a classifier and time needed to produce a prediction for an unseen example. We analyze the results from the experiments using Friedman and Nemenyi tests for assessing the statistical significance of differences in performance. The results of the analysis show that for multi-label classification the best performing methods overall are random forests of predictive clustering trees (RF-PCT) and hierarchy of multi-label classifiers (HOMER), followed by binary relevance (BR) and classifier chains (CC). Furthermore, RF-PCT exhibited the best performance according to all measures for multi-label ranking. The recommendation from this study is that when new methods for multi-label learning are proposed, they should be compared to RF-PCT and HOMER using multiple evaluation measures. (C) 2012 Elsevier Ltd. All rights reserved.

引用

页码：3084 / 3104

页数：21

共 46 条

[1]

[Anonymous], P 9 INT C MUS INF RE

[2]

[Anonymous], P IEEE AER C

[3] An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].