On the influence of feature selection in fuzzy rule-based regression model generation

被引:32
作者
Antonelli, Michela [1 ]
Ducange, Pietro [1 ]
Marcelloni, Francesco [1 ]
Segatori, Armando [1 ]
机构
[1] Univ Pisa, Dipartimento Ingn Informaz, I-56122 Pisa, Italy
关键词
Fuzzy rule-based systems; Feature selection; Multi-objective evolutionary fuzzy; rule-based systems; Fuzzy mutual information; Regression problems; High dimensional datasets; MUTUAL INFORMATION; UNIVERSAL APPROXIMATORS; PROBABILITY-MEASURES; MIN-REDUNDANCY; MAX-RELEVANCE; SYSTEMS; EVOLUTIONARY; CLASSIFIERS; ALGORITHMS; PARTITION;
D O I
10.1016/j.ins.2015.09.045
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fuzzy rule-based models have been extensively used in regression problems. Besides high accuracy, one of the most appreciated characteristics of these models is their interpretability, which is generally measured in terms of complexity. Complexity is affected by the number of features used for generating the model: the lower the number of features, the lower the complexity. Feature selection can therefore considerably contribute not only to speed up the learning process, but also to improve the interpretability of the final model. Nevertheless, a very few methods for selecting features before learning regression models have been proposed in the literature. In this paper, we focus on these methods, which perform feature selection as pre-processing step. In particular, we have adapted two state-of-the-art feature selection algorithms, namely NMIFS and CFS, originally proposed for classification, to deal with regression. Further, we have proposed FMIFS, a novel forward sequential feature selection approach, based on the minimal-redundancy-maximal-relevance criterion, which can manage directly fuzzy partitions. The relevance and the redundancy of a feature are measured in terms of, respectively, the fuzzy mutual information between the feature and the output variable, and the average fuzzy mutual information between the feature and the just selected features. The stopping criterion for the sequential selection is based on the average values of relevance and redundancy of the just selected features. We have performed two experiments on twenty regression datasets. In the first experiment, we aimed to show the effectiveness of feature selection in fuzzy rule-based regression model generation by comparing the mean square errors achieved by the fuzzy rule-based models generated using all the features, and the features selected by FMIFS, NMIFS and CFS. In order to avoid possible biases related to the specific algorithm, we adopted the well-known Wang and Mendel algorithm for generating the fuzzy rule-based models. We present that the mean square errors obtained by models generated by using the features selected by FMIFS are on average similar to the values achieved by using all the features and lower than the ones obtained by employing the subset of features selected by NMIFS and CFS. In the second experiment, we intended to evaluate how feature selection can reduce the convergence time of the evolutionary fuzzy systems, which are probably the most effective fuzzy techniques for tackling regression problems. By using a state-of-the-art multi-objective evolutionary fuzzy system based on rule learning and membership function tuning, we show that the number of evaluations can be considerably reduced when pre-processing the dataset by feature selection. (C) 2015 Published by Elsevier Inc.
引用
收藏
页码:649 / 669
页数:21
相关论文
共 53 条
[1]   A Fast and Scalable Multiobjective Genetic Fuzzy System for Linguistic Fuzzy Modeling in High-Dimensional Regression Problems [J].
Alcala, Rafael ;
Jose Gacto, Maria ;
Herrera, Francisco .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2011, 19 (04) :666-681
[2]   A Multiobjective Evolutionary Approach to Concurrently Learn Rule and Data Bases of Linguistic Fuzzy-Rule-Based Systems [J].
Alcala, Rafael ;
Ducange, Pietro ;
Herrera, Francisco ;
Lazzerini, Beatrice ;
Marcelloni, Francesco .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2009, 17 (05) :1106-1122
[3]   Looking for a good fuzzy system interpretability index: An experimental approach [J].
Alonso, Jose M. ;
Magdalena, Luis ;
Gonzalez-Rodriguez, Gil .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2009, 51 (01) :115-134
[4]  
[Anonymous], 1994, Journal of intelligent and Fuzzy systems
[5]   Genetic Training Instance Selection in Multiobjective Evolutionary Fuzzy Systems: A Coevolutionary Approach [J].
Antonelli, Michela ;
Ducange, Pietro ;
Marcelloni, Francesco .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2012, 20 (02) :276-290
[6]   Learning concurrently data and rule bases of Mamdani fuzzy rule-based systems by exploiting a novel interpretability index [J].
Antonelli, Michela ;
Ducange, Pietro ;
Lazzerini, Beatrice ;
Marcelloni, Francesco .
SOFT COMPUTING, 2011, 15 (10) :1981-1998
[7]   Learning knowledge bases of multi-objective evolutionary fuzzy systems by simultaneously optimizing accuracy, complexity and partition integrity [J].
Antonelli, Michela ;
Ducange, Pietro ;
Lazzerini, Beatrice ;
Marcelloni, Francesco .
SOFT COMPUTING, 2011, 15 (12) :2335-2354
[8]   Empirical study of feature selection methods based on individual feature evaluation for classification problems [J].
Arauzo-Azofra, Antonio ;
Aznarte, Jose Luis ;
Benitez, Jose M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (07) :8170-8177
[9]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[10]  
Bleuler S, 2003, LECT NOTES COMPUT SC, V2632, P494