A clustering-based active learning method to query informative and representative samples

被引:0
作者
Xuyang Yan
Shabnam Nazmi
Biniam Gebru
Mohd Anwar
Abdollah Homaifar
Mrinmoy Sarkar
Kishor Datta Gupta
机构
[1] North Carolina A&T State University,
来源
Applied Intelligence | 2022年 / 52卷
关键词
Active learning; Clustering; Informative-based query; Representative-based query; Center-based selection; Boundary-based selection;
D O I
暂无
中图分类号
学科分类号
摘要
Active learning (AL) has widely been used to address the shortage of labeled datasets. Yet, most AL techniques require an initial set of labeled data as the knowledge base to perform active querying. The informativeness of the initial labeled set significantly affects the subsequent active query; hence the performance of active learning. In this paper, a new clustering-based active learning framework, namely Active Learning using a Clustering-based Sampling (ALCS), is proposed to simultaneously consider the representativeness and informativeness of samples using no prior label information. A density-based clustering approach is employed to explore the cluster structure from the data without requiring exhaustive parameter tuning. A simple yet effective distance-based querying strategy is adopted to adjust the sampling weight between the center-based and boundary-based selections for active learning. A novel bi-cluster boundary-based sample query procedure is introduced to select the most uncertain samples across the boundary among adjacent clusters. Additionally, we developed an effective diversity exploration strategy to address the redundancy among queried samples. Our extensive experimentation provided a comparison of the ALCS approach with state-of-the-art methods, exhibiting that ALCS produces statistically better or comparable performance than state-of-the-art methods.
引用
收藏
页码:13250 / 13267
页数:17
相关论文
共 131 条
  • [1] Altman NS(1992)An introduction to kernel and nearest-neighbor nonparametric regression Amer Stat 46 175-185
  • [2] Cai D(2011)Manifold adaptive experimental design for text categorization IEEE Trans Knowl Data Eng 24 707-719
  • [3] He X(2013)Batch mode active sampling based on marginal probability distribution matching ACM Trans Knowl Discov Data (TKDD) 7 1-25
  • [4] Chattopadhyay R(2014)Domain adaptation and sample bias correction theory and algorithm for regression Theor Comput Sci 519 103-126
  • [5] Wang Z(1995)Support-vector networks Mach Learn 20 273-297
  • [6] Fan W(2006)Statistical comparisons of classifiers over multiple data sets J Mach Learn Res 7 1-30
  • [7] Davidson I(1997)Selective sampling using the query by committee algorithm Mach Learn 28 133-168
  • [8] Panchanathan S(2019)Active learning with error-correcting output codes Neurocomputing 364 182-191
  • [9] Ye J(2009)Semisupervised svm batch mode active learning with applications to image retrieval ACM Trans Inform Syst (TOIS) 27 1-29
  • [10] Cortes C(2006)Extreme learning machine: theory and applications Neurocomputing 70 489-501