Multilabel all-relevant feature selection using lower bounds of conditional mutual information

被引:5
|
作者
Teisseyre, Pawel [1 ,2 ]
Lee, Jaesung [3 ,4 ]
机构
[1] Polish Acad Sci, Inst Comp Sci, Warsaw, Poland
[2] Warsaw Univ Technol, Fac Math & Informat Sci, Warsaw, Poland
[3] Chung Ang Univ, Dept Artificial Intelligence, Seoul, South Korea
[4] Chung Ang Univ, AI ML Res Innovat Ctr, Seoul, South Korea
关键词
Multilabel data analysis; Feature selection; Information theory; Conditional mutual information; Permutation tests; LABEL FEATURE-SELECTION; EFFICIENT FEATURE-SELECTION; GENE-GENE INTERACTIONS; CLASSIFIER CHAINS; DETECT;
D O I
10.1016/j.eswa.2022.119436
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider a multilabel all-relevant feature selection task which is more general than the classical minimal -optimal subset task. Whereas the goal of the minimal-optimal methods is to find the smallest subset of features allowing accurate prediction of labels, the objective of the all-relevant methods is to identify all the features that are related to the target labels, including strongly and all weakly relevant features. The all-relevant task has received much interest in the fields where discovering the dependency structure between features and target variables is more important than the prediction itself, e.g., in medical and bioinformatics applications. In this paper, we formally describe the all-relevant problem for multi-label classification using an information -theoretic approach. We propose a relevancy score and an efficient method of its calculation based on the lower bounds of conditional mutual information. Another practical issue is how to separate the relevant features from irrelevant ones. To find a threshold, we propose a testing procedure based on a permutation scheme. Finally, empirical evaluation of all-relevant methods requires a specific approach. We consider a large variety of simulated datasets representing different dependency structures and containing various types of interactions. Empirical results on simulated datasets and a large clinical database demonstrate that the proposed method can successfully identify relevant features.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Feature selection by optimizing a lower bound of conditional mutual information
    Peng, Hanyang
    Fan, Yong
    INFORMATION SCIENCES, 2017, 418 : 652 - 667
  • [2] Multilabel Feature Selection Using Mutual Information and ML-ReliefF for Multilabel Classification
    Shi, Enhui
    Sun, Lin
    Xu, Jiucheng
    Zhang, Shiguang
    IEEE ACCESS, 2020, 8 : 145381 - 145400
  • [3] FEATURE SELECTION WITH WEIGHTED CONDITIONAL MUTUAL INFORMATION
    Celik, Ceyhun
    Bilge, Hasan Sakir
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2015, 30 (04): : 585 - 596
  • [4] Streaming Feature Selection for Multilabel Learning Based on Fuzzy Mutual Information
    Lin, Yaojin
    Hu, Qinghua
    Liu, Jinghua
    Li, Jinjin
    Wu, Xindong
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2017, 25 (06) : 1491 - 1507
  • [5] Feature Selection with Conditional Mutual Information Considering Feature Interaction
    Liang, Jun
    Hou, Liang
    Luan, Zhenhua
    Huang, Weiping
    SYMMETRY-BASEL, 2019, 11 (07):
  • [6] Fast binary feature selection with conditional mutual information
    Fleuret, F
    JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 5 : 1531 - 1555
  • [7] Feature selection based on weighted conditional mutual information
    Zhou, Hongfang
    Wang, Xiqian
    Zhang, Yao
    APPLIED COMPUTING AND INFORMATICS, 2024, 20 (1/2) : 55 - 68
  • [8] Feature Selection in Regression Tasks Using Conditional Mutual Information
    Latorre Carmona, Pedro
    Sotoca, Jose M.
    Pla, Filiberto
    Phoa, Frederick K. H.
    Dias, Jose Bioucas
    PATTERN RECOGNITION AND IMAGE ANALYSIS: 5TH IBERIAN CONFERENCE, IBPRIA 2011, 2011, 6669 : 224 - 231
  • [9] Mutual information-based feature selection for multilabel classification
    Doquire, Gauthier
    Verleysen, Michel
    NEUROCOMPUTING, 2013, 122 : 148 - 155
  • [10] Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems
    Sun, Lin
    Yin, Tengyu
    Ding, Weiping
    Qian, Yuhua
    Xu, Jiucheng
    INFORMATION SCIENCES, 2020, 537 : 401 - 424