Feature selection using neighborhood uncertainty measures and Fisher score for gene expression data classification

被引：6

作者：

Xu, Jiucheng ^{[1
,2
]}

Qu, Kanglin ^{[1
,2
]}

Qu, Kangjian ^{[3
]}

Hou, Qincheng ^{[1
,2
]}

Meng, Xiangru ^{[1
,2
]}

机构：

[1] Henan Normal Univ, Coll Comp & Informat Engn, Xinxiang 453007, Peoples R China

[2] Engn Lab Intelligence Business & Internet Things, Xinxiang, Henan, Peoples R China

[3] Nanjing Inst Technol, Coll Comp Engn, Nanjing 210000, Peoples R China

来源：

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS | 2023年 / 14卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Gene selection; Neighborhood rough set; Uncertainty measures; Fisher score; ATTRIBUTE REDUCTION; ROUGH SETS; ALGORITHM; INFORMATION;

D O I：

10.1007/s13042-023-01878-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The classification of gene expression data provides a basis for the study of pathogenesis and treatment. However, this type of data is characterized by high dimensionality and small samples, which seriously affect the classification results. Consequently, it is necessary to use a gene selection algorithm to select key genes from gene expression data to improve the classification results, but the existing gene selection algorithm has the problems of low classification precision and high time complexity. Therefore, this paper proposes a gene selection algorithm using neighborhood uncertainty measures and Fisher score. First, to make full use of the information provided by the neighborhood decision system, the neighborhood fusion coverage and neighborhood fusion credibility are defined based on the neighborhood coverage and neighborhood credibility, and they are used to characterize neighborhood uncertainty measures. Second, the neighborhood uncertainty measures are extended by combining the algebraic and information theory views, and a heuristic nonmonotonic gene selection algorithm is designed based on the neighborhood uncertainty measures. The algorithm makes full use of the information in the neighborhood decision system to evaluate the importance of genes from the algebraic and information theory views, thereby selecting an optimal gene subset and improving classification precision. Third, Fisher score method is introduced into the proposed algorithm to preliminarily eliminate redundant genes to reduce the time cost of calculation and improve the performance of the algorithm. Finally, by comparing the experimental results of our algorithm with those of existing gene selection algorithms on ten gene datasets, it is proved that our algorithm can effectively improve the classification results for gene data.

引用

页码：4011 / 4028

页数：18

共 52 条

[1] Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments [J].

Apolloni, Javier ;

Leguizamon, Guillermo ;

Alba, Enrique .

APPLIED SOFT COMPUTING, 2016, 38 :922-932

[2] A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data [J].

Aziz, Rabia ;

Verma, C. K. ;

Srivastava, Namita .

GENOMICS DATA, 2016, 8 :4-15

[3]

Chen, 2018, INT J PERFORMABILITY, V14, P280

[4] Double-quantitative multigranulation rough fuzzy set based on logical operations in multi-source decision systems [J].

Chen, Xiuwei ;

Xu, Weihua .

INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2022, 13 (04) :1021-1048

[5] Attribute group for attribute reduction [J].

Chen, Yan ;

Liu, Keyu ;

Song, Jingjing ;

Fujita, Hamido ;

Yang, Xibei ;

Qian, Yuhua .

INFORMATION SCIENCES, 2020, 535 :64-80

[6] Gene selection for tumor classification using neighborhood rough sets and entropy measures [J].

Chen, Yumin ;

Zhang, Zunjun ;

Zheng, Jianzhong ;

Ma, Ying ;

Xue, Yu .

JOURNAL OF BIOMEDICAL INFORMATICS, 2017, 67 :59-68

[7] A novel hybrid genetic algorithm with granular information for feature selection and optimization [J].

Dong, Hongbin ;

Li, Tao ;

Ding, Rui ;

Sun, Jing .

APPLIED SOFT COMPUTING, 2018, 65 :33-46

[8] MULTIPLE COMPARISONS AMONG MEANS [J].

DUNN, OJ .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1961, 56 (293) :52-&

[9] Quick attribute reduction with generalized indiscernibility models [J].

Fan Jing ;

Jiang Yunliang ;

Liu Yong .

INFORMATION SCIENCES, 2017, 397 :15-36

[10] Attribute reduction based on max-decision neighborhood rough set model [J].

Fan, Xiaodong ;

Zhao, Weida ;

Wang, Changzhong ;

Huang, Yang .

KNOWLEDGE-BASED SYSTEMS, 2018, 151 :16-23

← 1 2 3 4 5 6 →