A new improved filter-based feature selection model for high-dimensional data

被引:0
作者
Deepak Raj Munirathinam
Mohanasundaram Ranganadhan
机构
[1] Vellore Institute of Technology,School of Computer Science and Engineering
来源
The Journal of Supercomputing | 2020年 / 76卷
关键词
Classification; Data mining; Feature selection; Relief; Bioinformatics; Noisy feature;
D O I
暂无
中图分类号
学科分类号
摘要
Preprocessing of data is ubiquitous, and choosing significant attributes has been one of the important steps in the prior processing of data. Feature selection is used to create a subset of relevant feature for effective classification of data. In a classification of high-dimensional data, the classifier usually depends on the feature subset that has been used for classification. The Relief algorithm is a popular heuristic approach to select significant feature subsets. The Relief algorithm estimates feature individually and selects top-scored feature for subset generation. Many extensions of the Relief algorithm have been developed. However, an important defect in the Relief-based algorithms has been ignored for years. Because of the uncertainty and noise of the instances used for measuring the feature score in the Relief algorithm, the outcome results will vacillate with the instances, which lead to poor classification accuracy. To fix this problem, a novel feature selection algorithm based on Chebyshev distance-outlier detection model is proposed called noisy feature removal-Relief, NFR-ReliefF in short. To demonstrate the performance of NFR-ReliefF algorithm, an extensive experiment, including classification tests, has been carried out on nine benchmarking high-dimensional datasets by uniting the proposed model with standard classifiers, including the naïve Bayes, C4.5 and KNN. The results prove that NFR-ReliefF outperforms the other models on most tested datasets.
引用
收藏
页码:5745 / 5762
页数:17
相关论文
共 42 条
[1]  
Robnik-Sikonja M(2003)Theoretical and empirical analysis ReliefF and RReliefF Mach Learn 53 23-69
[2]  
Kononenko I(2013)A feature subset selection algorithm automatic recommendation in method J Artif Intell Res 47 1-34
[3]  
Wang G(2007)Iterative relief for feature weighting: algorithms, theories, and applications IEEE Trans Pattern Anal Mach Intell 29 1035-1051
[4]  
Song Q(2010)Local-learning-based feature selection for high-dimensional data analysis IEEE Trans Pattern Anal Mach Intell 32 1610-1626
[5]  
Sun H(2000)Statistical pattern recognition: a review IEEE Trans Pattern Anal Mach Intell 22 4-37
[6]  
Zhang X(2013)A fast clustering-based feature sub-set selection algorithm for the high-dimensional data IEEE Trans Knowl Data Eng 25 1-14
[7]  
Xu B(2002)Input feature selection by mutual information based on Parzen window IEEE Trans Pattern Anal Mach Intell 24 1667-1671
[8]  
Zhou Y(2013)A review of feature selection methods on synthetic data Knowl Inf Syst 34 483-519
[9]  
Sun Y(2014)A survey on feature selection methods Comput Electr Eng 40 16-28
[10]  
Sun Y(2003)An introduction to variable and feature selection J Mach Learn Res 3 1157-1182