Membership-margin based feature selection for mixed type and high-dimensional data: Theory and applications

被引:16
作者
Hedjazi, Lyamine [1 ,2 ]
Aguilar-Martin, Joseph [2 ]
Le Lann, Marie-Veronique [2 ,3 ]
Kempowsky-Hamon, Tatiana [2 ]
机构
[1] Inst Cardiometab & Nutr, Dept Om Sci, F-75013 Paris, France
[2] CNRS, LAAS, F-31031 Toulouse, France
[3] Univ Toulouse, LAAS, F-31031 Toulouse, France
关键词
Feature selection; Fuzzy classifier; Mixed-type data; Machine learning; Margin; UNSUPERVISED FEATURE-SELECTION; FEATURE SUBSET-SELECTION; FUZZY-ROUGH SETS; BREAST-CANCER; LEARNING RULE; CLASSIFICATION; ALGORITHM; PREDICTION; WEIGHTS; SYSTEMS;
D O I
10.1016/j.ins.2015.06.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The present paper describes a new feature weighting method based on a membership margin. Distinctive properties of the proposed method include its capability to process problems characterized by mixed-type data (quantitative, qualitative and interval) as well as a huge number of features. The key idea is to map simultaneously all the features of different types into a common space; the membership space. Once all features are represented in a homogeneous space, a feature weighting task can be performed in unified way. This weighting approach is integrated here within a fuzzy classifier through a fuzzy rule weighted concept in order to improve its performance. Each antecedent fuzzy set in the fuzzy if-then rule is weighted to characterize the importance of each proposition and therefore its corresponding feature. Weight estimation process is based on membership margin maximization to estimate a fuzzy weight of each feature in the membership space. Experiments on low and high dimensional real-world datasets demonstrate that the proposed approach can improve significantly the performance of the fuzzy rule-based as well as other state of the art classifiers and can even outperform classical feature weighting approaches. In particular, we show that this approach can yield meaningful results on two real-world applications for cancer prognosis and industrial process diagnosis. (C) 2015 Elsevier Inc. All rights reserved.
引用
收藏
页码:174 / 196
页数:23
相关论文
共 91 条
  • [1] A fuzzy classifier with ellipsoidal regions
    Abe, S
    Thawonmas, R
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1997, 5 (03) : 358 - 368
  • [2] Aguado J. C., 1999, QR 99 13 INT WORKSH, P124
  • [3] Aguilar-Martin J., 1982, APPROXIMATE REASONIN, P165
  • [4] A Fuzzy Association Rule-Based Classification Model for High-Dimensional Problems With Genetic Rule Selection and Lateral Tuning
    Alcala-Fdez, Jesus
    Alcala, Rafael
    Herrera, Francisco
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2011, 19 (05) : 857 - 872
  • [5] [Anonymous], 1967, IEEE T INF THEORY
  • [6] Improving the performance of fuzzy rule-based classification systems with interval-valued fuzzy sets and genetic amplitude tuning
    Antonio Sanz, Jose
    Fernandez, Alberto
    Bustince, Humberto
    Herrera, Francisco
    [J]. INFORMATION SCIENCES, 2010, 180 (19) : 3674 - 3685
  • [7] Baldi P., 2001, BIOINFORMATICS MACHI
  • [8] Feature selection with SVD entropy: Some modification and extension
    Banerjee, Monami
    Pal, Nikhil R.
    [J]. INFORMATION SCIENCES, 2014, 264 : 118 - 134
  • [9] Bezdek J. C., 1981, Pattern recognition with fuzzy objective function algorithms
  • [10] Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses
    Bhattacharjee, A
    Richards, WG
    Staunton, J
    Li, C
    Monti, S
    Vasa, P
    Ladd, C
    Beheshti, J
    Bueno, R
    Gillette, M
    Loda, M
    Weber, G
    Mark, EJ
    Lander, ES
    Wong, W
    Johnson, BE
    Golub, TR
    Sugarbaker, DJ
    Meyerson, M
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2001, 98 (24) : 13790 - 13795