Online Streaming Feature Selection for High-Dimensional and Class-Imbalanced Data Based on Max-Decision Boundary

被引:0
作者
Lin Y. [1 ,2 ]
Chen X. [1 ,2 ]
Bai S. [1 ,2 ]
Wang C. [1 ,2 ]
机构
[1] School of Computer Science and Engineering, Minnan Normal University, Zhangzhou
[2] Key Laboratory of Data Science and Intelligence Application, The Education Department of Fujian Province, Minnan Normal University, Zhangzhou
来源
Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence | 2020年 / 33卷 / 09期
基金
中国国家自然科学基金;
关键词
Adaptive Neighborhood; High-Dimensional and Class-Imbalanced Data; Neighborhood Rough Set; Online Feature Selection;
D O I
10.16451/j.cnki.issn1003-6059.202009006
中图分类号
学科分类号
摘要
The feature space of data changes with time dynamically. The number of features on training data is high-dimensional and fixed, and the label space is imbalanced. Motivated by the above, an online streaming feature selection algorithm for high-dimensional and class-imbalanced data based on max-decision boundary is proposed. An adaptive neighborhood relation is defined with consideration of the effect of boundary samples based on neighborhood rough set, and then a rough dependency calculation formula with respect to max-decision boundary is designed. Meanwhile, three online feature subset evaluation metrics are proposed to select features with great discriminability in majority and minority classes. Experiments on eleven high-dimensional and class-imbalanced datasets indicate that the proposed method achieves better performance than some state-of-the-art online streaming feature selection algorithms. © 2020, Science Press. All right reserved.
引用
收藏
页码:820 / 829
页数:9
相关论文
共 19 条
  • [1] DUDEK G., Artificial Immune System with Local Feature Selection for Short-Term Load Forecasting, IEEE Transactions on Evolutionary Computation, 21, 1, pp. 116-130, (2017)
  • [2] ROBNIK-SIKONJA M, KNONNENKO I., Theoretical and Empirical Analysis of ReliefF and RReliefF, Machine Learning, 53, pp. 23-69, (2003)
  • [3] DING W, STEPINSKI T F, MU Y, Et al., Subkilometer Crater Discovery with Boosting and Transfer Learning, ACM Transactions on Intelligent Systems and Technology, 2, 4, (2011)
  • [4] YU K, DING W, WU X D., LOFS: A Library of Online Streaming Features Selection, Knowledge-Based System, 113, pp. 1-3, (2016)
  • [5] CHEN X Y, LIN Y J, WANG C X., Online Streaming Feature Selection for High-Dimensional and Class-Imbalanced Data Based on Neighborhood Rough Set, Pattern Recognition and Artificial Intelligence, 32, 8, pp. 726-735, (2019)
  • [6] LIU J H, LIN M L, WANG C X, Et al., Multi-label Feature Selection Algorithm Based on Local Subspace, Pattern Recognition and Artificial Intelligence, 29, 3, pp. 240-251, (2016)
  • [7] WANG C X, LIN Y J, LIU J H., Feature Selection for Multi-label Learning with Missing Labels, Applied Intelligence, 49, 8, pp. 3027-3042, (2019)
  • [8] ZHOU P, HU X G, LI P P, Et al., Online Feature Selection for High-Dimensional Class-Imbalanced Data, Knowledge-Based Systems, 136, pp. 187-199, (2017)
  • [9] LIU J H, LIN Y J, LI Y W, Et al., Online Multi-label Streaming Feature Selection Based on Neighborhood Rough Set, Pattern Recognition, 84, pp. 273-287, (2018)
  • [10] LIN Y K, HU Q H, ZHANG J, Et al., Multi-label Feature Selection with Streaming Labels, Information Sciences, 372, pp. 256-275, (2016)