A stochastic approximation approach to fixed instance selection

被引:0
作者
Yeo G.F.A. [1 ]
Akman D. [1 ]
Hudson I. [1 ]
Chan J. [2 ]
机构
[1] School of Science (Mathematical Sciences), Royal Melbourne Institute of Technology, 124 La Trobe St, Melbourne, 3000, VIC
[2] School of Computing Technologies (Computer Science), Royal Melbourne Institute of Technology, 124 La Trobe St, Melbourne, 3000, VIC
关键词
Dimensionality reduction; Gradient descent optimisation; Instance selection; Stochastic approximation;
D O I
10.1016/j.ins.2023.01.090
中图分类号
学科分类号
摘要
Instance selection plays a critical role in enhancing the efficacy and efficiency of machine learning tools when utilised for a data mining task. This study proposes a fixed instance selection algorithm based on simultaneous perturbation stochastic approximation that works in conjunction with any supervised machine learning method and any corresponding performance metric, which we call SpFixedIS. This algorithm provides an approximate solution to the NP-hard instance selection problem and additionally serves as a way of intelligently selecting a specified number of instances within a training set with regards to a machine learning model. The shape of the objective function obtained from the test accuracy against the number of instances selected is examined extensively for our instance selection algorithm. The SpFixedIS algorithm was tested on 43 diverse datasets across 6 different machine learning classifiers. The results show that in over 90% of cases SpFixedIS provides a statistically significant improvement at a 5% level with intelligent selection over random selection for the same number of instances. Furthermore, with respect to probabilistic models, specifically Gaussian Naive Bayes, SpFixedIS provides a statistically significant improvement compared to models that utilise the entirety of the training set in 84% of the experimented values ranging from 50 to 1000 instances. © 2023
引用
收藏
页码:558 / 579
页数:21
相关论文
共 42 条
  • [1] Aksakalli V., Malekipirbazari M., Feature selection via binary simultaneous perturbaton stochastic approximation, Pattern Recognition Letters, 75, pp. 41-47, (2016)
  • [2] Aksakalli V., Yenice Z., Malekipirbazari M., Kamyar K., Feature selection using stochastic approximation with barzilai and borwein non-monotone gains, Computers and Operations Research, 132, (2021)
  • [3] Barredo Arrieta A., Diaz-Rodriguez N., Del Ser J., Bennetot A., Tabik S., Barbado A., Garcia S., Gil-Lopez S., Molina D., Benjamins R., Chatila R., Herrera F., Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai, Information Fusion, 58, pp. 82-115, (2020)
  • [4] Barzilai J., Borwein J., Two-point step size gradient methods, IMA Journal of Numerical Analysis, 8, pp. 141-148, (1988)
  • [5] Bischl B., Casalicchio G., Feurer M., Hutter F., Lang M., Mantovani R.G., (2019)
  • [6] Brighton H., Mellish C., Advances in instance selection for instance-based learning algorithms, Data Mining and Knowledge Discovery, (2002)
  • [7] Chou C., Kuo B., Chang F., The generalized condensed nearest neighbor rule as a data reduction method
  • [8] Cover T., Hart P., Nearest neighbor pattern classification, IEEE Transactions on Information Theory, 13, pp. 21-27, (1967)
  • [9] Czarnowski I., Weighted ensemble with one-class classification and over-sampling and instance selection (wecoi): An approach for learning from imbalanced data streams, Journal of Computational Science, 61, (2022)
  • [10] Dai Y.H., Liao L.Z., R-linear convergence of the barzilai and borwein gradient method, IMA Journal of Numerical Analysis, 22, (2002)