Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification

被引:143
|
作者
Sun, Lin [1 ,3 ,4 ]
Wang, Tianxiang [1 ]
Ding, Weiping [2 ]
Xu, Jiucheng [1 ,4 ]
Lin, Yaojin [3 ]
机构
[1] Henan Normal Univ, Coll Comp & Informat Engn, Xinxiang 453007, Henan, Peoples R China
[2] Nantong Univ, Sch Informat Sci & Technol, Nantong 226019, Peoples R China
[3] Minnan Normal Univ, Key Lab Data Sci & Intelligence Applicat, Zhangzhou 363000, Peoples R China
[4] Key Lab Artificial Intelligence & Personalized Le, Xinxiang 453007, Henan, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature selection; Neighborhood rough sets; Fisher Score; Multilabel classification; LABEL FEATURE-SELECTION; UNCERTAINTY MEASURES; INFORMATION;
D O I
10.1016/j.ins.2021.08.032
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, feature selection for multilabel classification has attracted attention in machine learning and data mining. However, some feature selection methods ignore the correlations among labels, resulting in low performance, and most of them face challenges in determining an appropriate neighborhood radius for neighborhood systems and suffer from expensive time cost. To overcome the issues, we propose a novel feature selection method using Fisher score and multilabel neighborhood rough sets (MNRS) in multilabel neighborhood decision systems. First, to identify the correlations between labels under a binary distribution, two types of new mutual information between labels are considered, and their balance coefficients are defined. By enhancing strong correlations and weakening weak correlations between labels, a mutual information-based Fisher score model with a second-order correlation between labels is designed to fit multilabel data. Second, to address the problem of automatically choosing a neighborhood radius, a subset of hetero-geneous and homogeneous samples is employed to develop a new classification margin as a neighborhood radius, and some concepts of neighborhood, neighborhood class, and upper and lower approximations are formulated for multilabel neighborhood decision systems. The weight and dependency degree are presented to effectively measure the uncertainty of samples in multilabel neighborhood decision systems. Thus, we further present a new classification margin-based MNRS model. Finally, a filter-wrapper preprocessing algorithm for feature selection using the improved Fisher score model is proposed to decrease the spatiotemporal complexity of multilabel data, and a heuristic feature selection algorithm is designed for improve classification performance on multilabel datasets. Experimental results on thirteen multilabel datasets show that the proposed algorithm is effective in selecting significant features, demonstrating its excellent classification ability in multilabel datasets. (c) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:887 / 912
页数:26
相关论文
共 50 条
  • [31] FEATURE SELECTION AND IMAGE CLASSIFICATION USING ROUGH SETS THEORY
    Aguiar Pessoa, Alex Sandro
    Stephany, Stephan
    Garcia Fonseca, Leila Maria
    2011 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2011, : 2904 - 2907
  • [32] Feature Selection and Classification of Protein Subfamilies Using Rough Sets
    Rahman, Shuzlina Abdul
    Abu Bakar, Azuraliza
    Hussein, Zeti Azura Mohamed
    2009 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS, VOLS 1 AND 2, 2009, : 32 - 35
  • [33] Fast Multilabel Feature Selection via Global Relevance and Redundancy Optimization
    Zhang, Jia
    Lin, Yidong
    Jiang, Min
    Li, Shaozi
    Tang, Yong
    Long, Jinyi
    Weng, Jian
    Tan, Kay Chen
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 5721 - 5734
  • [34] Streaming Feature Selection for Multilabel Learning Based on Fuzzy Mutual Information
    Lin, Yaojin
    Hu, Qinghua
    Liu, Jinghua
    Li, Jinjin
    Wu, Xindong
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2017, 25 (06) : 1491 - 1507
  • [35] A PSO-based multi-objective multilabel feature selection method in classification
    Zhang, Yong
    Gong, Dun-wei
    Sun, Xiao-yan
    Guo, Yi-nan
    SCIENTIFIC REPORTS, 2017, 7
  • [36] Multilabel all-relevant feature selection using lower bounds of conditional mutual information
    Teisseyre, Pawel
    Lee, Jaesung
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 216
  • [37] Multilabel voice disorder classification using raw waveforms
    Disken, Gokay
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2024, 32 (04) : 590 - 604
  • [38] Memetic feature selection for multilabel text categorization using label frequency difference
    Lee, Jaesung
    Yu, Injun
    Park, Jaegyun
    Kim, Dae-Won
    INFORMATION SCIENCES, 2019, 485 : 263 - 280
  • [39] Multilabel Feature Selection Based on Relative Discernibility Pair Matrix
    Yao, Erliang
    Li, Deyu
    Zhai, Yanhui
    Zhang, Chao
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2022, 30 (07) : 2388 - 2401
  • [40] Feature selection for multilabel classification with missing labels via multi-scale fusion fuzzy uncertainty measures
    Yin, Tengyu
    Chen, Hongmei
    Wang, Zhihong
    Liu, Keyu
    Yuan, Zhong
    Horng, Shi-Jinn
    Li, Tianrui
    PATTERN RECOGNITION, 2024, 154