Gravitational search algorithm and K-means for simultaneous feature selection and data clustering: a multi-objective approach

被引:0
作者
Jay Prakash
Pramod Kumar Singh
机构
[1] ABV - Indian Institute of Information technology and Management Gwalior,Computational Intelligence and Data Mining Research Laboratory
来源
Soft Computing | 2019年 / 23卷
关键词
Feature selection; Data clustering; Multi-objective optimization; Gravitational search algorithm;
D O I
暂无
中图分类号
学科分类号
摘要
Clustering is an unsupervised classification method used to group the objects of an unlabeled data set. The high dimensional data sets generally comprise of irrelevant and redundant features also along with the relevant features which deteriorate the clustering result. Therefore, feature selection is necessary to select a subset of relevant features as it improves discrimination ability of the original set of features which helps in improving the clustering result. Though many metaheuristics have been suggested to select subset of the relevant features in wrapper framework based on some criteria, most of them are marred by the three key issues. First, they require objects class information a priori which is unknown in unsupervised feature selection. Second, feature subset selection is devised on a single validity measure; hence, it produces a single best solution biased toward the cardinality of the feature subset. Third, they find difficulty in avoiding local optima owing to lack of balancing in exploration and exploitation in the feature search space. To deal with the first issue, we use unsupervised feature selection method where no class information is required. To address the second issue, we follow pareto-based approach to obtain diverse trade-off solutions by optimizing conceptually contradicting validity measures silhouette index (Sil) and feature cardinality (d). For the third issue, we introduce genetic crossover operator to improve diversity in a recent Newtonian law of gravity-based metaheuristic binary gravitational search algorithm (BGSA) in multi-objective optimization scenario; it is named as improved multi-objective BGSA for feature selection (IMBGSAFS). We use ten real-world data sets for comparison of the IMBGSAFS results with three multi-objective methods MBGSA, MOPSO, and NSGA-II in wrapper framework and the Pearson’s linear correlation coefficient (FM-CC) as a multi-objective filter method. We employ four multi-objective quality measures convergence, diversity, coverage and ONVG. The obtained results show superiority of the IMBGSAFS over its competitors. An external clustering validity index F-measure also establish the above finding. As the decision maker picks only a single solution from the set of trade-off solutions, we employee the F-measure to select a final single solution from the external archive. The quality of final solution achieved by IMBGSAFS is superior over competitors in terms of clustering accuracy and/or smaller subset size.
引用
收藏
页码:2083 / 2100
页数:17
相关论文
共 69 条
  • [1] Bharti KK(2015)Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering Exp Syst Appl 42 3105-3114
  • [2] Singh PK(2004)Handling multiple objectives with particle swarm optimization IEEE Trans Evolut Comput 8 256-279
  • [3] Coello CAC(1997)Feature selection for classification Intell Data Anal 1 131-156
  • [4] Pulido GT(1979)A cluster separation measure IEEE Trans Pattern Anal Mach Intell 2 224-227
  • [5] Lechuga MS(2000)A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii Lect Notes Comput Sci 1917 849-858
  • [6] Dash M(2004)Feature selection for unsupervised learning J Mach Learn Res 5 845-889
  • [7] Liu H(1937)The use of ranks to avoid the assumption of normality implicit in the analysis of variance J Am Stat Assoc 32 675-701
  • [8] Davies DL(2015)Fuzzy logic in the gravitational search algorithm for the optimization of modular neural networks in pattern recognition Exp Syst Appl 42 5839-5847
  • [9] Bouldin DW(2003)An introduction to variable and feature selection J Mach Learn Res 3 1157-1182
  • [10] Deb K(2017)Pareto front feature selection based on artificial bee colony optimization Inf Sci 422 462-479