ML-KnockoffGAN: Deep online feature selection for multi-label learning

被引:6
作者
Paul, Dipanjyoti [1 ]
Bardhan, Snigdha [1 ]
Saha, Sriparna [1 ]
Mathew, Jimson [1 ]
机构
[1] Indian Inst Technol, Patna, India
关键词
Online feature selection (OFS); Knockoff features; Multi-label data; False discovery rate (FDR); Generative adversarial nets (GAN); Relevancy power; STREAMING FEATURE-SELECTION; CLASSIFICATION; OPTIMIZATION; INFORMATION; REDUNDANCY;
D O I
10.1016/j.knosys.2023.110548
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many online platforms now generate data in a streaming manner, resulting in the continuous production of new features. Multi-label data generation has also surged in recent years, making feature selection for online multi-label data essential. However, existing feature selection methods are mainly based on single-label data or offline selection approaches. Only a few methods exist for multi-label data in an online framework, and most of these methods use classical or evolutionary-based techniques, paying little attention to deep learning. In this study, we propose a novel deep-learning feature selection technique that utilizes generative adversarial nets (GANs). We develop a framework, called ML-KnockoffGAN, which generates knockoff features in a multi-label setting, and then features are selected by considering both the generated knockoff features and real features together. As the features arrive online in a continuous fashion, our proposed method incorporates online features and selects them in a group-wise manner. We tested our method on various multi-label data sets from different domains, including text, biology, and audio, and our results show that our approach outperforms existing methods, with an average improvement of 7.1-16.3% for all evaluation metrics. Our method also illustrates the benefits of deep learning techniques in utilizing existing trained parameters to train new windows of features, requiring fewer epochs.(c) 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 53 条
  • [1] Simultaneous feature selection and clustering of micro-array and RNA-sequence gene expression data using multiobjective optimization
    Alok, Abhay Kumar
    Gupta, Pooja
    Saha, Sriparna
    Sharma, Vineet
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2020, 11 (11) : 2541 - 2563
  • [2] Belghazi MI, 2018, PR MACH LEARN RES, V80
  • [3] Candes E.J., 2016, Panning for gold: Model-free knockoffs for high-dimensional controlled variable selection, V1610
  • [4] Novel multi-label feature selection via label symmetric uncertainty correlation learning and feature redundancy evaluation
    Dai, Jianhua
    Chen, Jiaolong
    Liu, Ye
    Hu, Hu
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 207
  • [5] MULTIPLE COMPARISONS AMONG MEANS
    DUNN, OJ
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1961, 56 (293) : 52 - &
  • [6] Online streaming feature selection using rough sets
    Eskandari, S.
    Javidi, M. M.
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2016, 69 : 35 - 57
  • [7] Multi-label feature selection based on label correlations and feature redundancy
    Fan, Yuling
    Chen, Baihua
    Huang, Weiqin
    Liu, Jinghua
    Weng, Wei
    Lan, Weiyao
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 241
  • [8] A comparison of alternative tests of significance for the problem of m rankings
    Friedman, M
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1940, 11 : 86 - 92
  • [9] Distributed Selection of Continuous Features in Multilabel Classification Using Mutual Information
    Gonzalez-Lopez, Jorge
    Ventura, Sebastian
    Cano, Alberto
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (07) : 2280 - 2293
  • [10] Hatami M, 2020, IRAN CONF ELECTR ENG, P1589