A new improved filter-based feature selection model for high-dimensional data

被引:16
作者
Munirathinam, Deepak Raj [1 ]
Ranganadhan, Mohanasundaram [1 ]
机构
[1] Vellore Inst Technol, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India
关键词
Classification; Data mining; Feature selection; Relief; Bioinformatics; Noisy feature;
D O I
10.1007/s11227-019-02975-7
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Preprocessing of data is ubiquitous, and choosing significant attributes has been one of the important steps in the prior processing of data. Feature selection is used to create a subset of relevant feature for effective classification of data. In a classification of high-dimensional data, the classifier usually depends on the feature subset that has been used for classification. The Relief algorithm is a popular heuristic approach to select significant feature subsets. The Relief algorithm estimates feature individually and selects top-scored feature for subset generation. Many extensions of the Relief algorithm have been developed. However, an important defect in the Relief-based algorithms has been ignored for years. Because of the uncertainty and noise of the instances used for measuring the feature score in the Relief algorithm, the outcome results will vacillate with the instances, which lead to poor classification accuracy. To fix this problem, a novel feature selection algorithm based on Chebyshev distance-outlier detection model is proposed called noisy feature removal-Relief, NFR-ReliefF in short. To demonstrate the performance of NFR-ReliefF algorithm, an extensive experiment, including classification tests, has been carried out on nine benchmarking high-dimensional datasets by uniting the proposed model with standard classifiers, including the naive Bayes, C4.5 and KNN. The results prove that NFR-ReliefF outperforms the other models on most tested datasets.
引用
收藏
页码:5745 / 5762
页数:18
相关论文
共 31 条
[31]  
Yu L., 2004, KDD, P737, DOI DOI 10.1145/1014052.1014149