Optimizing different loss functions in multilabel classifications

被引:12
作者
Díez J. [1 ]
Luaces O. [1 ]
del Coz J.J. [1 ]
Bahamonde A. [1 ]
机构
[1] Artificial Intelligence Center, University of Oviedo, Gijón
关键词
Multilabel classification; Optimization; Structured outputs; Tensor product;
D O I
10.1007/s13748-014-0060-7
中图分类号
学科分类号
摘要
Multilabel classification (ML) aims to assign a set of labels to an instance. This generalization of multiclass classification yields to the redefinition of loss functions and the learning tasks become harder. The objective of this paper is to gain insights into the relations of optimization aims and some of the most popular performance measures: subset (or 0/1), Hamming, and the example-based F-measure. To make a fair comparison, we implemented three ML learners for optimizing explicitly each one of these measures in a common framework. This can be done considering a subset of labels as a structured output. Then, we use structured output support vector machines tailored to optimize a given loss function. The paper includes an exhaustive experimental comparison. The conclusion is that in most cases, the optimization of the Hamming loss produces the best or competitive scores. This is a practical result since the Hamming loss can be minimized using a bunch of binary classifiers, one for each label separately, and therefore, it is a scalable and fast method to learn ML tasks. Additionally, we observe that in noise-free learning tasks optimizing the subset loss is the best option, but the differences are very small. We have also noticed that the biggest room for improvement can be found when the goal is to optimize an F-measure in noisy learning tasks. © 2014, Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:107 / 118
页数:11
相关论文
共 30 条
[1]  
Cheng W., Hullermeier E., Combining instance-based learning and logistic regression for multilabel classification, Mach Learn, 76, 2, pp. 211-225, (2009)
[2]  
Crammer K., Singer Y., On the algorithmic implementation of multiclass kernel-based vector machines, J Mach Learn Res, 2, pp. 265-292, (2002)
[3]  
Dembczynski K., Cheng W., Hullermeier E., Bayes optimal multilabel classification via probabilistic classifier chains, Proceedings of the International Conference on Machine Learning (ICML), (2010)
[4]  
Dembczynski K., Kotlowski W., Jachnik A., Waegeman W., Hullermeier E., Optimizing the f-measure in multi-label classification: plug-in rule approach versus structured loss minimization, ICML, (2013)
[5]  
Dembczynski K., Waegeman W., Cheng W., Hullermeier E., An exact algorithm for F-measure maximization, Proceedings of the neural information processing systems (NIPS), (2011)
[6]  
Dembczynski K., Waegeman W., Cheng W., Hullermeier E., On label dependence and loss minimization in multi-label classification, Mach Learn, 88, pp. 1-41, (2012)
[7]  
Diez J., del Coz J.J., Luaces O., Bahamonde A., Tensor products to optimize label-based loss measures in multilabel classifications, (2012)
[8]  
Elisseeff A., Weston J., A kernel method for multi-labelled classification, Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), pp. 681-687, (2001)
[9]  
Gao W., Zhou Z.H., On the consistency of multi-label learning, J Mach Learn Res Proc Track (COLT), 19, pp. 341-358, (2011)
[10]  
Ghamrawi N., McCallum A., Collective multi-label classification, Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 195-200, (2005)