Mixed measure-based feature selection using the Fisher score and neighborhood rough sets

被引：0

作者：

Lin Sun

Jiuxiao Zhang

Weiping Ding

Jiucheng Xu

机构：

[1] College of Computer and Information Engineering,

[2] Henan Normal University,undefined

[3] Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province,undefined

[4] School of Information Science and Technology,undefined

[5] Nantong University,undefined

来源：

Applied Intelligence | 2022年 / 52卷

关键词：

Feature selection; Neighborhood rough sets; Fisher score; Neighborhood decision system; Data classification;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Existing feature selection methods easily neglect the distribution of data, and require most of the neighborhood radius in neighborhood rough sets (NRS) to be selected artificially. These limitations result in the misclassification of samples. To address these drawbacks, this paper presents a mixed measure-based feature selection method using the Fisher score and an NRS model. First, the variation coefficient of the features in different decision classes is defined to depict the dispersion degree of different features, based on which, the neighborhood class is described to develop a novel NRS model. The concepts of dependency degree, neighborhood knowledge granularity, and average neighborhood entropy are defined, and then a mixed measure combining the information and algebra views is proposed to measure the uncertainty in neighborhood decision systems. Second, the average correlation degree of the feature subset is computed to assess the redundancy of the reduced feature subset. By combining the classification accuracy of the selected features, the reduction rate of the classification result, and the average correlation degree of the reduced feature set, we can construct an adaptive neighborhood radius function to avoid the artificial selection of the optimal neighborhood radius. Then, an optimal feature subset can be obtained according to the internal and external significance of the features. Third, the variation coefficient of the samples in different decision classes in each feature is defined to compute the dispersion degree of the samples, and the average of all samples in each feature is added to the between-class scatter to eliminate the effect of the different measurement dimensions of the features; then, the Fisher score model is improved to eliminate the noise of the high-dimensional data. Finally, a heuristic feature selection algorithm with the Fisher score based on the new NRS model is designed to select an optimal feature subset. Experimental results applied to five low-dimensional UCI datasets and nine high-dimensional gene expression datasets showed that the developed algorithm is effective and can select an optimal reduced subset with high classification accuracy when compared with some of the latest algorithms.

引用

页码：17264 / 17288

页数：24

共 224 条

[1]

Sun L(2021)Feature selection using fuzzy neighborhood entropy-based uncertainty measures for fuzzy neighborhood multigranulation rough sets IEEE Trans Fuzzy Syst 29 19-33

[2]

Wang LY(2020)Attribute group for attribute reduction Inf Sci 535 64-80

[3]

Ding WP(2021)Feature selection using bare-bones particle swarm optimization with mutual information Pattern Recogn 112 1245-1259

[4]

Qian YH(2019)Joint neighborhood entropy-based gene selection method with fisher score for tumor classification Appl Intell 49 130-134

[5]

Xu JC(2021)A parallel metaheuristic approach for ensemble feature selection based on multi-core architectures Expert Syst Appl 182 887-912

[6]

Chen Y(2021)Adaptive graph-based generalized regression model for unsupervised feature selection Knowl-Based Syst 227 2149-2163

[7]

Liu KY(2011)Hybrid feature selection methods based on D-score and support vector machine J Comput Appl 31 401-424

[8]

Song JJ(2021)Feature selection using Fisher score and multilabel neighborhood rough sets for multilabel classification Inf Sci 578 282-296

[9]

Fujita H(2021)The determination of flood damage curve in areas lacking disaster data based on the optimization principle of variation coefficient and beta distribution Sci Total Environ 750 2744-2757

[10]

Yang XB(2020)Supervised information granulation strategy for attribute reduction Int J Mach Learn Cybern 11 351-368

← 1 2 3 4 5 6 7 8 9 10 →