Mutually-exclusive-and-collectively-exhaustive feature selection scheme

被引:43
作者
Lee, Chia-Yen [1 ]
Chen, Bo-Syun [1 ]
机构
[1] Natl Cheng Kung Univ, Inst Mfg Informat Syst, Tainan 701, Taiwan
关键词
Feature selection; Mutually-exclusive-and-collectively exhaustive; Data mining; Semiconductor manufacturing; Bioinformatics;
D O I
10.1016/j.asoc.2017.04.055
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the fields of machine learning and data mining, feature selection methods are used to identify the most cost-effective predictors and to give a deeper understanding of pattern recognition and extraction. This study proposes a novel mutually-exclusive-and-collectively-exhaustive (MECE) feature selection scheme. Based on the MECE principle in decision science, the scheme, which has three stages including evaluation of independence, evaluation of importance and evaluation of completeness, aims to identify the independent and important variables with complete information. A case study of fault classification in semiconductor manufacturing and a study of breast cancer relapse identification in bioinformatics are used to validate the proposed scheme. The results demonstrate that the proposed MECE scheme selects fewer variables, avoids the multicollinearity problem, and improves fault classification accuracy in the two case studies. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:961 / 971
页数:11
相关论文
共 31 条
[1]  
[Anonymous], 1980, J Roy Stat Soc: Ser C (Appl Stat), DOI [DOI 10.2307/2986296, 10.2307/2986296]
[2]   Bayesian Variable Selection for Probit Mixed Models Applied to Gene Selection [J].
Baragatti, Meli .
BAYESIAN ANALYSIS, 2011, 6 (02) :209-229
[3]   Mutual information based input feature selection for classification problems [J].
Cang, Shuang ;
Yu, Hongnian .
DECISION SUPPORT SYSTEMS, 2012, 54 (01) :691-698
[4]   Random forests for genomic data analysis [J].
Chen, Xi ;
Ishwaran, Hemant .
GENOMICS, 2012, 99 (06) :323-329
[5]  
Dash M., 1997, Intelligent Data Analysis, V1
[6]   Semiconductor Manufacturing Process Monitoring Based on Adaptive Substatistical PCA [J].
Ge, Zhiqiang ;
Song, Zhihuan .
IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2010, 23 (01) :99-108
[7]  
Guyon I., 2003, Journal of Machine Learning Research, V3, P1157, DOI 10.1162/153244303322753616
[8]  
Han J., 2012, Data Mining, P393, DOI [DOI 10.1016/B978-0-12-381479-1.00009-5, 10.1016/B978-0-12-381479-1.00009-5]
[9]  
Hastie T., 2009, ELEMENTS STAT LEARNI, DOI DOI 10.1007/978-0-387-84858-7
[10]   ANALYSIS AND SELECTION OF VARIABLES IN LINEAR-REGRESSION [J].
HOCKING, RR .
BIOMETRICS, 1976, 32 (01) :1-49