Improved neighborhood space based feature selection algorithm for high-dimensional mixed data

被引:0
作者
Zhang T.-F. [1 ]
Zhang Y.-D. [1 ]
Ma F.-M. [2 ]
机构
[1] College of Automation, College of Artificial Intelligence, Nanjing University of Posts and Telecommunications, Nanjing
[2] College of Information Engineering, Nanjing University of Finance and Economics, Nanjing
来源
Kongzhi yu Juece/Control and Decision | 2024年 / 39卷 / 03期
关键词
evaluation function; feature selection; high-dimensional mixed data; neighborhood rough set; neighborhood space;
D O I
10.13195/j.kzyjc.2022.0789
中图分类号
学科分类号
摘要
As important data preprocessing technology in the field of data mining, the feature selection algorithm can effectively deal with the “curse of dimensionality”caused by high-dimensional data. Nonetheless, how to perform feature selection on high-dimensional mixed data is still one of the focuses and difficulties of current research. Because of competently dealing with mixed data of categorical attributes and numerical attributes coexisting, the neighborhood rough set model has been widely used in feature selection of mixed data in recent years. However, existing measurement of the neighborhood relationship for mixed data still adopts the simple fusion of categorical data partition based on equivalence relationship and numerical data partition based on similarity relationship. When the features of high-dimensional mixed data are selected by the partitioned neighborhood space and predefined evaluation function, the adaptability is poor. Therefore, an improved construction method of neighborhood space is proposed on the basis of the neighborhood rough set model. Considering boundary overlapped data and the size of neighborhood space, an evaluation function is designed to characterize the discrimination ability of neighborhood space. On this basis, a heuristic feature selection algorithm considering high-dimensional mixed data is proposed. The validity and superiority of proposed algorithm are verified by the UCI standard data set. © 2024 Northeast University. All rights reserved.
引用
收藏
页码:929 / 938
页数:9
相关论文
共 23 条
[1]  
Ji W T, Pang Y, Jia X Y, Et al., Fuzzy rough sets and fuzzy rough neural networks for feature selection: A review, WIREs Data Mining and Knowledge Discovery, 11, 3, pp. 1-15, (2021)
[2]  
Bolon-Canedo V, Alonso-Betanzos A., Ensembles for feature selection: A review and future trends, Information Fusion, 52, pp. 1-12, (2019)
[3]  
Li Z Q, Du J Q, Nie B, Et al., Summary of feature selection methods, Computer Engineering and Applications, 55, 24, pp. 10-19, (2019)
[4]  
Pawlak Z., Rough sets, International Journal of Computer & Information Sciences, 11, 5, pp. 341-356, (1982)
[5]  
Zhang P F, Li T R, Wang G Q, Et al., Multi-source information fusion based on rough set theory: A review, Information Fusion, 68, pp. 85-117, (2021)
[6]  
Zhou T, Lu H L, Ren H L, Et al., Survey on attribute reduction algorithm of rough set, Acta Electronica Sinica, 49, 7, pp. 1439-1449, (2021)
[7]  
Liu J, Li T R, Xie P, Et al., Urban big data fusion based on deep learning: An overview, Information Fusion, 53, pp. 123-133, (2020)
[8]  
Yuan Z, Chen H M, Li T R, Et al., Unsupervised attribute reduction for mixed data based on fuzzy rough sets, Information Sciences, 572, pp. 67-87, (2021)
[9]  
Lin T Y., Granular computing on binary relations I: Data mining and neighborhood systems, Rough Sets in Knowledge Discovery, 1, pp. 107-121, (1998)
[10]  
Hu Q H, Yu D R, Liu J F, Et al., Neighborhood rough set based heterogeneous feature subset selection, Information Sciences, 178, 18, pp. 3577-3594, (2008)