Compressed labeling on distilled labelsets for multi-label learning

被引:40
|
作者
Zhou, Tianyi [3 ]
Tao, Dacheng [3 ]
Wu, Xindong [1 ,2 ]
机构
[1] Hefei Univ Technol, Dept Comp Sci, Hefei 230009, Peoples R China
[2] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
[3] Univ Technol Sydney, Fac Engn & IT, Ctr Quantum Computat & Intelligent Syst QCIS, Broadway, NSW 2007, Australia
基金
美国国家科学基金会;
关键词
NEURAL-NETWORKS; CLASSIFICATION; ALGORITHMS;
D O I
10.1007/s10994-011-5276-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Directly applying single-label classification methods to the multi-label learning problems substantially limits both the performance and speed due to the imbalance, dependence and high dimensionality of the given label matrix. Existing methods either ignore these three problems or reduce one with the price of aggravating another. In this paper, we propose a {0,1} label matrix compression and recovery method termed "compressed labeling (CL)" to simultaneously solve or at least reduce these three problems. CL first compresses the original label matrix to improve balance and independence by preserving the signs of its Gaussian random projections. Afterward, we directly utilize popular binary classification methods (e.g., support vector machines) for each new label. A fast recovery algorithm is developed to recover the original labels from the predicted new labels. In the recovery algorithm, a "labelset distilling method" is designed to extract distilled labelsets (DLs), i.e., the frequently appeared label subsets from the original labels via recursive clustering and subtraction. Given a distilled and an original label vector, we discover that the signs of their random projections have an explicit joint distribution that can be quickly computed from a geometric inference. Based on this observation, the original label vector is exactly determined after performing a series of Kullback-Leibler divergence based hypothesis tests on the distribution about the new labels. CL significantly improves the balance of the training samples and reduces the dependence between different labels. Moreover, it accelerates the learning process by training fewer binary classifiers for compressed labels, and makes use of label dependence via DLs based tests. Theoretically, we prove the recovery bounds of CL which verifies the effectiveness of CL for label compression and multi-label classification performance improvement brought by label correlations preserved in DLs. We show the effectiveness, efficiency and robustness of CL via 5 groups of experiments on 21 datasets from text classification, image annotation, scene classification, music categorization, genomics and web page classification.
引用
收藏
页码:69 / 126
页数:58
相关论文
共 50 条
  • [31] A relative labeling importance estimation algorithm based on global-local label correlations for multi-label learning
    Liu, Yilu
    Cao, Fuyuan
    APPLIED INTELLIGENCE, 2023, 53 (05) : 4940 - 4958
  • [32] Multi-label Active Learning Based on Maximum Correntropy Criterion: Towards Robust and Discriminative Labeling
    Wang, Zengmao
    Du, Bo
    Zhang, Lefei
    Zhang, Liangpei
    Fang, Meng
    Tao, Dacheng
    COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 : 453 - 468
  • [33] Multi-label learning: a review of the state of the art and ongoing research
    Gibaja, Eva
    Ventura, Sebastian
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2014, 4 (06) : 411 - 444
  • [34] Multi-label Learning based on Label Entropy Guided Clustering
    Zhang, Ju-Jie
    Fang, Min
    Li, Xiao
    2014 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (CIT), 2014, : 756 - 760
  • [35] Imbalance multi-label data learning with label specific features
    Rastogi, Reshma
    Mortaza, Sayed
    NEUROCOMPUTING, 2022, 513 : 395 - 408
  • [36] Partial multi-label learning via specific label disambiguation
    Li, Feng
    Shi, Shengfei
    Wang, Hongzhi
    KNOWLEDGE-BASED SYSTEMS, 2022, 250
  • [37] Leveraging Supervised Label Dependency Propagation for Multi-label Learning
    Fu, Bin
    Xu, Guandong
    Wang, Zhihai
    Cao, Longbing
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 1061 - 1066
  • [38] LIFT: Multi-Label Learning with Label-Specific Features
    Zhang, Min-Ling
    Wu, Lei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2015, 37 (01) : 107 - 120
  • [39] Robust Learning of Multi-Label Classifiers under Label Noise
    Kumar, Himanshu
    Manwani, Naresh
    Sastry, P. S.
    PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 90 - 97
  • [40] Discovering and Exploiting Deterministic Label Relationships in Multi-Label Learning
    Papagiannopoulou, Christina
    Tsoumakas, Grigorios
    Tsamardinos, Ioannis
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 915 - 924