Feature selection using neighborhood uncertainty measures and Fisher score for gene expression data classification

被引:0
作者
Jiucheng Xu
Kanglin Qu
Kangjian Qu
Qincheng Hou
Xiangru Meng
机构
[1] Henan Normal University,College of Computer and Information Engineering
[2] Engineering Lab of Intelligence Business & Internet of Things,College of Computer Engineering
[3] Nanjing Institute of Technology,undefined
来源
International Journal of Machine Learning and Cybernetics | 2023年 / 14卷
关键词
Gene selection; Neighborhood rough set; Uncertainty measures; Fisher score;
D O I
暂无
中图分类号
学科分类号
摘要
The classification of gene expression data provides a basis for the study of pathogenesis and treatment. However, this type of data is characterized by high dimensionality and small samples, which seriously affect the classification results. Consequently, it is necessary to use a gene selection algorithm to select key genes from gene expression data to improve the classification results, but the existing gene selection algorithm has the problems of low classification precision and high time complexity. Therefore, this paper proposes a gene selection algorithm using neighborhood uncertainty measures and Fisher score. First, to make full use of the information provided by the neighborhood decision system, the neighborhood fusion coverage and neighborhood fusion credibility are defined based on the neighborhood coverage and neighborhood credibility, and they are used to characterize neighborhood uncertainty measures. Second, the neighborhood uncertainty measures are extended by combining the algebraic and information theory views, and a heuristic nonmonotonic gene selection algorithm is designed based on the neighborhood uncertainty measures. The algorithm makes full use of the information in the neighborhood decision system to evaluate the importance of genes from the algebraic and information theory views, thereby selecting an optimal gene subset and improving classification precision. Third, Fisher score method is introduced into the proposed algorithm to preliminarily eliminate redundant genes to reduce the time cost of calculation and improve the performance of the algorithm. Finally, by comparing the experimental results of our algorithm with those of existing gene selection algorithms on ten gene datasets, it is proved that our algorithm can effectively improve the classification results for gene data.
引用
收藏
页码:4011 / 4028
页数:17
相关论文
共 196 条
[1]  
Liu KY(2020)Supervised information granulation strategy for attribute reduction Int J Mach Learn Cybern 11 2149-2163
[2]  
Yang XB(2022)Feature selection based on multiview entropy measures in multiperspective rough set Int J Intell Syst 37 7200-7234
[3]  
Yu HL(2022)Incremental feature selection using a conditional entropy based on fuzzy dominance neighborhood rough sets IEEE Trans Fuzzy Syst 30 1683-1697
[4]  
Fujita H(2022)Local rough set-based feature selection for label distribution learning with incomplete labels Int J Mach Learn Cybern 13 2345-2364
[5]  
Chen XJ(2022)Incremental feature selection by sample selection and feature-based accelerator Appl Soft Comput 535 64-80
[6]  
Liu D(2020)Attribute group for attribute reduction Inf Sci 52 9148-9173
[7]  
Xu JC(2022)Dynamic updating approximations of local generalized multigranulation neighborhood rough set Appl Intell 177 28-40
[8]  
Qu KL(2007)Rough sets: Some extensions Inf Sci 42 655-667
[9]  
Meng XR(2009)Exploring the boundary region of tolerance rough sets for feature selection Pattern Recogn 50 4031-4042
[10]  
Sun YH(2020)Neighborhood multi-granulation rough sets-based attribute reduction using Lebesgue and entropy measures in incomplete neighborhood decision systems Knowl-Based Syst 178 3577-3594