Compressed labeling on distilled labelsets for multi-label learning

被引:40
|
作者
Zhou, Tianyi [3 ]
Tao, Dacheng [3 ]
Wu, Xindong [1 ,2 ]
机构
[1] Hefei Univ Technol, Dept Comp Sci, Hefei 230009, Peoples R China
[2] Univ Vermont, Dept Comp Sci, Burlington, VT 05405 USA
[3] Univ Technol Sydney, Fac Engn & IT, Ctr Quantum Computat & Intelligent Syst QCIS, Broadway, NSW 2007, Australia
基金
美国国家科学基金会;
关键词
NEURAL-NETWORKS; CLASSIFICATION; ALGORITHMS;
D O I
10.1007/s10994-011-5276-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Directly applying single-label classification methods to the multi-label learning problems substantially limits both the performance and speed due to the imbalance, dependence and high dimensionality of the given label matrix. Existing methods either ignore these three problems or reduce one with the price of aggravating another. In this paper, we propose a {0,1} label matrix compression and recovery method termed "compressed labeling (CL)" to simultaneously solve or at least reduce these three problems. CL first compresses the original label matrix to improve balance and independence by preserving the signs of its Gaussian random projections. Afterward, we directly utilize popular binary classification methods (e.g., support vector machines) for each new label. A fast recovery algorithm is developed to recover the original labels from the predicted new labels. In the recovery algorithm, a "labelset distilling method" is designed to extract distilled labelsets (DLs), i.e., the frequently appeared label subsets from the original labels via recursive clustering and subtraction. Given a distilled and an original label vector, we discover that the signs of their random projections have an explicit joint distribution that can be quickly computed from a geometric inference. Based on this observation, the original label vector is exactly determined after performing a series of Kullback-Leibler divergence based hypothesis tests on the distribution about the new labels. CL significantly improves the balance of the training samples and reduces the dependence between different labels. Moreover, it accelerates the learning process by training fewer binary classifiers for compressed labels, and makes use of label dependence via DLs based tests. Theoretically, we prove the recovery bounds of CL which verifies the effectiveness of CL for label compression and multi-label classification performance improvement brought by label correlations preserved in DLs. We show the effectiveness, efficiency and robustness of CL via 5 groups of experiments on 21 datasets from text classification, image annotation, scene classification, music categorization, genomics and web page classification.
引用
收藏
页码:69 / 126
页数:58
相关论文
共 50 条
  • [21] The Emerging Trends of Multi-Label Learning
    Liu, Weiwei
    Wang, Haobo
    Shen, Xiaobo
    Tsang, Ivor W.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (11) : 7955 - 7974
  • [22] Multi-label Learning via Codewords
    Sedghi, Mahlagha
    Huang, Yinjie
    Georgiopoulos, Michael
    Anagnostopoulos, Georgios
    2018 IEEE 30TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2018, : 221 - 228
  • [23] Generative Multi-Label Correlation Learning
    Wang, Lichen
    Ding, Zhengming
    Lee, Kasey
    Han, Seungju
    Han, Jae-Joon
    Choi, Changkyu
    Fu, Yun
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2023, 17 (02)
  • [24] Multi-label Quadruplet Dictionary Learning
    Zheng, Jiayu
    Zhu, Wencheng
    Zhu, Pengfei
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 119 - 131
  • [25] Unconstrained Multimodal Multi-Label Learning
    Huang, Yan
    Wang, Wei
    Wang, Liang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (11) : 1923 - 1935
  • [26] Partial Multi-label Learning using Label Compression
    Yu, Tingting
    Yu, Guoxian
    Wang, Jun
    Domeniconi, Carlotta
    Zhang, Xiangliang
    20TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2020), 2020, : 761 - 770
  • [27] Multi-label learning with kernel local label information
    Fu, Xiaozhen
    Li, Deyu
    Zhai, Yanhui
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 207
  • [28] Feature selection for multi-label learning with streaming label
    Liu, Jinghua
    Li, Yuwen
    Weng, Wei
    Zhang, Jia
    Chen, Baihua
    Wu, Shunxiang
    NEUROCOMPUTING, 2020, 387 : 268 - 278
  • [29] An efficient multi-label learning method with label projection
    Lin, Luyue
    Liu, Bo
    Zheng, Xin
    Xiao, Yanshan
    Liu, Zhijing
    Cai, Hao
    KNOWLEDGE-BASED SYSTEMS, 2020, 207
  • [30] Multi-Label Learning with Global and Local Label Correlation
    Zhu, Yue
    Kwok, James T.
    Zhou, Zhi-Hua
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (06) : 1081 - 1094