ReliefF Based Pruning Model for Multi-Label Classification

被引:0
|
作者
Liu H.-Y. [1 ]
Wang Z.-H. [1 ]
Zhang Z.-D. [1 ]
机构
[1] School of Computer and Information Technology, Beijing Jiaotong University, Beijing
来源
基金
中国国家自然科学基金;
关键词
Feature selection; Label dependence; Multi-label classification; ReliefF; Stacking;
D O I
10.11897/SP.J.1016.2019.00483
中图分类号
学科分类号
摘要
Multi-label classification (MLC) is a machine learning problem in which models are sought that assign a subset of labels to each instance. MLC is receiving increased attention and is relevant to many domains such as text categorization, classification of music and videos, semantic annotation of images and many more. Recently, many studies are looking for efficient and accurate algorithms to cope with multi-label classification challenge. They are usually partitioned into two main categories: algorithm adaptation and problem transformation. In multi-label classification problem the labels will not occur independent of each other; instead, there are statistical dependencies between them. Nowadays, it is commonly accepted that exploiting dependencies between the labels is the key of improving the performance of multi-label classification problem. In this paper, we divide the utilizing methods of label dependency into two groups from the perspective of different ways of problem transformation: label grouping model and feature space extending model. Label grouping model normally groups labels into several label subsets based on certain strategies or criteria to incorporate label dependences. While feature space extending model usually extends the feature space of the binary classifiers to let them discover existing label dependence by themselves. We find out that the common difficulty for both kinds of models is how to accurately measure the dependences between labels. In particular, for feature space extending model, how to choose proper labels to extend the original feature space is the key to improve classification performance. On the basis of this, we propose a ReliefF based pruning model for multi-label classification (ReliefF based Stacking, RFS). RFS measures the dependencies between labels in a feature selection perspective, and then selects the more relative labels into the original feature space. And we use a stacking based algorithm during training and prediction. The key contribution of this algorithm is threefold: (1) It provides a new method to measure the dependences between labels. Unlike existing methods measuring pair-wise label dependences, our method related to the ReliefF algorithm takes into account the effect of all interacting labels. (2) Instead of extending the original feature space with all labels, we choose the closely related labels. Thus, we can reduce noise in the data and avoid adverse effects caused by irrelevant labels. (3) In the feature selection phase, we design a brand new strategy that treats original features and label features as the same features and select together. Our empirical study is divided into two parts: a systematic study on parameters of our algorithm and a comparative study between our proposal and other multi-label classification algorithms. The effects of parameters, feature selection strategies and base classifiers on RFS are discussed in the first part of experiments. In the second part, experiment results based on 6 evaluating measures on 9 multi-label benchmark datasets show that RFS is more effective compared to other advanced multi-label classification algorithms. © 2019, Science Press. All right reserved.
引用
收藏
页码:483 / 496
页数:13
相关论文
共 22 条
  • [1] Tsoumakas G., Katakis I., Vlahavas I., Mining multi-label data, Data Mining and Knowledge Discovery Handbook, pp. 667-685, (2009)
  • [2] Katakis I., Tsoumakas G., Vlahavas I., Multilabel text classification for automated tag suggestion, Proceedings of the ECML/PKDD 2008 Discovery Challenge, pp. 75-83, (2008)
  • [3] Turnbull D., Barrington L., Torres D., Et al., Semantic annotation and retrieval of music and sound effects, IEEE Transactions on Audio, Speech, and Language Processing, 16, 2, pp. 467-476, (2008)
  • [4] Snoek C.G.M., Worring M., Van Gemert J.C., Et al., The challenge problem for automated detection of 101 semantic concepts in multimedia, Proceedings of the 14th Annual ACM International Conference on Multimedia, pp. 421-430, (2006)
  • [5] Yang S., Kim S.K., Ro Y.M., Semantic home photo categorization, IEEE Transactions on Circuits and Systems for Video Technology, 17, 3, pp. 324-335, (2007)
  • [6] Zhang M.L., Zhou Z.H., A review on multi-label learning algorithms, IEEE Transactions on Knowledge and Data Engineering, 26, 8, pp. 1819-1837, (2014)
  • [7] Dembczynski K., Waegeman W., Cheng W., Et al., On label dependence and loss minimization in multi-label classification, Machine Learning, 88, 1-2, pp. 5-45, (2012)
  • [8] Tsoumakas G., Katakis I., Vlahavas I., Random k-labelsets for multilabel classification, IEEE Transactions on Knowledge and Data Engineering, 23, 7, pp. 1079-1089, (2011)
  • [9] Rokach L., Schclar A., Itach E., Ensemble methods for multi-label classification, Expert Systems with Applications, 41, 16, pp. 7507-7523, (2014)
  • [10] Read J., Pfahringer B., Holmes G., Multi-label classification using ensembles of pruned sets, Proceedings of the 8th IEEE International Conference on Data Mining, 2008(ICDM'08), pp. 995-1000, (2008)