A multi-criteria active learning method based on adaptive density clustering

被引:0
|
作者
He Z. [1 ,2 ]
Zhu W. [1 ]
Chen X. [1 ]
Zhang X. [3 ]
机构
[1] School of Control Engineering, Northeastern University at Qinhuangdao, Qinhuangdao
[2] Hebei Key Laboratory of Micro-Nano Sensing, Qinhuangdao
[3] School of Optoelectronics, Beijing Institute of Technology, Beijing
来源
Yi Qi Yi Biao Xue Bao/Chinese Journal of Scientific Instrument | 2024年 / 45卷 / 03期
关键词
active learning; adaptive density clustering; Gaussian process regression; multi-criteria fusion; outlier robustness;
D O I
10.19650/j.cnki.cjsi.J2312180
中图分类号
学科分类号
摘要
Active learning proves instrumental in training superior machine learning models while minimizing labeling costs. The combination of RD and QBC algorithms effectively addresses issues associated with considering only a single criterion. However, the K-means clustering upon which RD is based may include outliers, leading to a decrease in model performance, and QBC requires maintaining multiple models and indirectly provides sample information. To address these issues, we propose an adaptive density clustering-based Gaussian process regression (ADC-GPR) algorithm, which efficiently selects samples by first clustering and then utilizing uncertainty directly. The ADC clustering in this algorithm is not only robust against outliers but also adapts to the distribution characteristics of the dataset, providing representative sample points and their corresponding clusters for subsequent AL. This method ensures both representativeness and diversity in unsupervised selection and considers informativeness, representativeness, and diversity in supervised selection. The experimental results demonstrate that compared to the RS, KS, and RD-GPR algorithms, the ADC-GPR algorithm exhibits an average performance improvement of 37. 3%, 8%, and 2. 8% respectively, with the same number of sampling iterations. Furthermore, the ADC-GPR algorithm demonstrates higher selection efficiency. © 2024 Science Press. All rights reserved.
引用
收藏
页码:179 / 187
页数:8
相关论文
共 19 条
  • [1] MOGHADDAM H N, TAMIJI Z, LAKEH M A, Et al., Multivariate analysis of food fraud: A review of NIR based instruments in tandem with chemometrics, Journal of Food Composition and Analysis, 107, (2022)
  • [2] ZHANG F, TANG X J, GONG A X, Et al., A bootstrap flexible contraction variable selection method based on the combination of frequency and regression coefficient, Chinese Journal of Scientific Instrument, 41, 1, pp. 64-70, (2020)
  • [3] SUGIYAMA M, NAKAJIMA S., Pool-based active learning in approximate linear regression, Machine Learning, 75, 3, pp. 249-274, (2009)
  • [4] HE Z, SONG S, SHEN K, Et al., Performance enhancement-based active learning sample selection method [J], Journal of Chemometrics, 36, 3, (2022)
  • [5] KRISHNAKUMAR A., Active learning literature survey[J], (2007)
  • [6] RAMIREZ-LOPEZ L, SCHMIDT K, BEHRENS T, Et al., Sampling optimal calibration sets in soil infrared spectroscopy, Geoderma, 226, pp. 140-150, (2014)
  • [7] LIU Z ANG, JIANG X, WU D R., Pool-based unsupervised linear regression active learning, Acta Automatica Sinica, 47, 12, pp. 2771-2783, (2021)
  • [8] WU D R., Pool-based sequential active learning for regression, IEEE Transactions on Neural Networks and Learning Systems, 30, 5, pp. 1348-1359, (2018)
  • [9] AHMED M, SERAJ R, ISLAM S M S., The k-means algorithm: A comprehensive survey and performance evaluation, Electronics, 9, 8, (2020)
  • [10] CHEN F, ZHANG T, LIU R., An active learning method based on variational autoencoder and dbscan clustering, Computational Intelligence and Neuroscience, (2021)