Biclustering-based multi-label classification

被引:4
作者
Schmitke, Luiz Rafael [1 ]
Paraiso, Emerson Cabrera [1 ]
Nievola, Julio Cesar [1 ]
机构
[1] Pontificia Univ Catolica Parana, Grad Program Informat, Rua Imaculada Conceicao 1155, BR-80215901 Curitiba, Parana, Brazil
关键词
Machine learning; Multi-label classification; Problem transformation; Biclustering;
D O I
10.1007/s10115-024-02109-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In multi-label classification, data can have multiple labels simultaneously. Two approaches to this issue are either transforming the multi-label data or adapting single-label algorithms for multi-label data. Despite the problem transformation's effectiveness, some algorithms use fixed parameters to determine the number of subproblems, and the label relationships maintenance is done without using correlation or co-occurrence measures. In this work, the approach that converts multi-label problems into multiple binary subproblems was chosen because this offers a low execution time, enabling the use of complex single-label algorithms during classification. However, it has low performance in multi-label metrics. Thus, the BicbPT algorithm is introduced, which uses the biclustering technique combined with the multi-label to binary problem transformation to improve performance in multi-label metrics without increasing this transformation's running time. For the evaluation, comparisons were made with the algorithms BR, CC, ECC, RAkEL and LP. Single-label algorithms SVM, C4.5 and Naive Bayes were applied to classify the binary subproblems across 12 datasets. The experiments demonstrate that BicbPT performed better in the multi-label metrics than the other multi-label to binary algorithms, being similar only to ECC. Still, the running time is up to 10 times higher in ECC, which makes the BicbPT better. Also, it keeps running time similar to algorithms in the multi-label to binary category. Finally, during the experiments, it was possible to perceive that the way the labels influence each other allow to improve the multi-label classification and not only consider maintaining the relationships like other approaches do.
引用
收藏
页码:4861 / 4898
页数:38
相关论文
共 40 条
[1]   Iterative signature algorithm for the analysis of large-scale gene expression data [J].
Bergmann, S ;
Ihmels, J ;
Barkai, N .
PHYSICAL REVIEW E, 2003, 67 (03) :18
[2]  
Bulbul H. I., 2011, Proceedings of the 2011 Tenth International Conference on Machine Learning and Applications (ICMLA 2011), P298, DOI 10.1109/ICMLA.2011.49
[3]   Document transformation for multi-label feature selection in text categorization [J].
Chen, Weizhu ;
Yan, Jun ;
Zhang, Benyu ;
Chen, Zheng ;
Yang, Qiang .
ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, :451-+
[4]  
Cheng Y, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P93
[5]  
Cherman E.A., 2011, CLEI Electronic Journal, V14, P4
[6]  
Chih-Chin Lai, 2004, Fourth International Conference on Hybrid Intelligent Systems, P44, DOI 10.1109/ICHIS.2004.21
[7]  
Curi Z, 2018, ARXIV
[8]  
Curi Z, 2019, P 32 INT FLOR ART IN, P167
[9]   On label dependence and loss minimization in multi-label classification [J].
Dembczynski, Krzysztof ;
Waegeman, Willem ;
Cheng, Weiwei ;
Huellermeier, Eyke .
MACHINE LEARNING, 2012, 88 (1-2) :5-45
[10]  
Demsar J, 2006, J MACH LEARN RES, V7, P1