Hierarchical Feature Selection Based on Label Distribution Learning

被引:40
|
作者
Lin, Yaojin [1 ]
Liu, Haoyang [1 ]
Zhao, Hong [1 ]
Hu, Qinghua [2 ]
Zhu, Xingquan [3 ]
Wu, Xindong [4 ]
机构
[1] Minnan Normal Univ, Sch Comp Sci, Key Lab Data Sci & Intelligence Applicat, Zhangzhou 363000, Fujian, Peoples R China
[2] Tianjin Univ, Sch Comp Sci, Tianjin 300354, Peoples R China
[3] Florida Atlantic Univ, Dept Elect Engn & Comp Sci, Boca Raton, FL 33431 USA
[4] Hefei Univ Technol, Key Lab Knowledge Engn Big Data, Minist Educ, Hefei 230009, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature extraction; Task analysis; Correlation; Electronic mail; Training; Dinosaurs; Computer science; Common and label-specific features; feature selection; hierarchical classification; label distribution learning; label enhancement; CLASSIFICATION;
D O I
10.1109/TKDE.2022.3177246
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hierarchical classification learning, which organizes data categories into a hierarchical structure, is an effective approach for large-scale classification tasks. The high dimensionality of data feature space, represented in hierarchical class structures, is one of the main research challenges. In addition, the class hierarchy often introduces imbalanced class distributions and causes overfitting. In this paper, we propose a feature selection method based on label distribution learning to address the above challenges. The crux is to alleviate the class imbalance problem and learn a discriminative feature subset for hierarchical classification process. Due to correlation between different class categories in the hierarchical tree structure, sibling categories can provide additional supervisory information for each learning sub tasks, which, in turn, alleviates the problem of under-sampling of minority categories. Therefore, we transform hierarchical labels to a hierarchical label distribution to represent this correlation. After that, a discriminative feature subset is selected recursively, by the common features and label-specific feature constraints, to ensure that downstream classification tasks can achieve the best performance. Experiments and comparisons, using seven well-established feature selection algorithms on six real data sets with different degrees of imbalance, demonstrate the superiority of the proposed method.
引用
收藏
页码:5964 / 5976
页数:13
相关论文
共 50 条
  • [31] Automated Feature Selection: A Reinforcement Learning Perspective
    Liu, Kunpeng
    Fu, Yanjie
    Wu, Le
    Li, Xiaolin
    Aggarwal, Charu
    Xiong, Hui
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (03) : 2272 - 2284
  • [32] Label Distribution Learning Based on Hierarchical Tag Structure
    Liu K.
    You M.
    Wei L.
    Data Analysis and Knowledge Discovery, 2024, 8 (02) : 44 - 55
  • [33] Dynamic Online Label Distribution Feature Selection Based on Label Importance and Label Correlation
    Chen, Weiliang
    Sun, Xiao
    Ren, Fuji
    APPLIED SCIENCES-BASEL, 2025, 15 (03):
  • [34] Hierarchical Label Distribution Learning for Disease Prediction
    Ren, Yi
    Xia, Jing
    Yu, Ziyi
    Zhang, Zhenchuan
    Zhou, Tianshu
    Tian, Yu
    Li, Jingsong
    MEDINFO 2023 - THE FUTURE IS ACCESSIBLE, 2024, 310 : 755 - 759
  • [35] Multi-Label Feature Selection Based on Min-Relevance Label
    Gao, Wanfu
    Pan, Hanlin
    IEEE ACCESS, 2023, 11 : 410 - 420
  • [36] ProLSFEO-LDL: Prototype Selection and Label- Specific Feature Evolutionary Optimization for Label Distribution Learning
    Gonzalez, Manuel
    Cano, Jose-Ramon
    Garcia, Salvador
    APPLIED SCIENCES-BASEL, 2020, 10 (09):
  • [37] Weakly-supervised label distribution feature selection via label-specific features and label correlation
    Shu, Wenhao
    Hu, Jiayu
    Qian, Wenbin
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, : 2181 - 2201
  • [38] Fast Label Enhancement for Label Distribution Learning
    Wang, Ke
    Xu, Ning
    Ling, Miaogen
    Geng, Xin
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (02) : 1502 - 1514
  • [39] Feature selection based on label distribution and fuzzy mutual information
    Xiong, Chuanzhen
    Qian, Wenbin
    Wang, Yinglong
    Huang, Jintao
    INFORMATION SCIENCES, 2021, 574 : 297 - 319
  • [40] Random forest feature selection for partial label learning
    Sun, Xianran
    Chai, Jing
    NEUROCOMPUTING, 2023, 561