Border Sampling Through Coupling Markov Chain Monte Carlo

被引:1
作者
Li, Guichong [1 ]
Japkowicz, Nathalie [1 ]
Stocki, Trevor J. [2 ]
Ungar, R. Kurt [2 ]
机构
[1] Comp Sci Univ Ottawa, Ottawa, ON, Canada
[2] Health Canada, Radiat Protect Bureau, Ottawa, ON, Canada
来源
ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2008年
关键词
D O I
10.1109/ICDM.2008.52
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, Progressive Border Sampling (PBS) was proposed for sample selection in supervised learning by progressively learning an augmented full border from small labeled datasets. However, this quadratic learning algorithm is inapplicable to large datasets. In this paper, we incorporate the PBS to a state of the art technique called Coupling Markov Chain Monte Carlo (CMCMC) in an attempt to scale the original algorithm up, on large labeled datasets. The CMCMC can produce an exact sample while a naive strategy for Markov Chain Monte Carlo cannot guarantee the convergence to a stationary distribution. The resulting CMCMC PBS algorithm is thus proposed for border sampling on large datasets. CMCMC-PBS exhibits several remarkable characteristics: linear time complexity, learner-independence, and a consistent convergence to an optimal sample from the original training sets by learning from their subsamples. Our experimental results on the 33 either small or large labeled datasets from the UCIKDD repository and a nuclear security application show that our new approach outperforms many previous sampling techniques for sample selection.
引用
收藏
页码:393 / +
页数:3
相关论文
共 50 条
  • [31] Importance Sampling in Stochastic Programming: A Markov Chain Monte Carlo Approach
    Parpas, Panos
    Ustun, Berk
    Webster, Mort
    Quang Kha Tran
    INFORMS JOURNAL ON COMPUTING, 2015, 27 (02) : 358 - 377
  • [32] Feature selection by Markov chain Monte Carlo sampling - A Bayesian approach
    Egmont-Petersen, M
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, PROCEEDINGS, 2004, 3138 : 1034 - 1042
  • [33] A mixture representation of π with applications in Markov chain Monte Carlo and perfect sampling
    Hobert, JP
    Robert, CP
    ANNALS OF APPLIED PROBABILITY, 2004, 14 (03) : 1295 - 1305
  • [34] Occupancy Grid Mapping with Markov Chain Monte Carlo Gibbs Sampling
    Merali, Rehman S.
    Barfoot, Timothy D.
    2013 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2013, : 3183 - 3189
  • [35] Accelerated Markov chain Monte Carlo sampling in electrical capacitance tomography
    Watzenig, Daniel
    Neumayer, Markus
    Fox, Colin
    COMPEL-THE INTERNATIONAL JOURNAL FOR COMPUTATION AND MATHEMATICS IN ELECTRICAL AND ELECTRONIC ENGINEERING, 2011, 30 (06) : 1842 - 1854
  • [36] Comparison of Markov Chain Monte Carlo sampling algorithms in subset simulation
    Lan C.
    Xu Z.
    Ma J.
    Zhao X.
    Li H.
    Tumu Gongcheng Xuebao/China Civil Engineering Journal, 2022, 55 (10): : 33 - 45and79
  • [37] Iterative importance sampling with Markov chain Monte Carlo sampling in robust Bayesian analysis
    Cruz, Ivette Raices
    Lindstroem, Johan
    Troffaes, Matthias C. M.
    Sahlin, Ullrika
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2022, 176
  • [38] Bayesian Genome Assembly and Assessment by Markov Chain Monte Carlo Sampling
    Howison, Mark
    Zapata, Felipe
    Edwards, Erika J.
    Dunn, Casey W.
    PLOS ONE, 2014, 9 (06):
  • [39] Implementation of a practical Markov chain Monte Carlo sampling algorithm in PyBioNetFit
    Neumann, Jacob
    Lin, Yen Ting
    Mallela, Abhishek
    Miller, Ely F.
    Colvin, Joshua
    Duprat, Abell T.
    Chen, Ye
    Hlavacek, William S.
    Posner, Richard G.
    BIOINFORMATICS, 2022, 38 (06) : 1770 - 1772
  • [40] Monte Carlo sampling of a Markov web
    Boulougouris, GC
    Frenkel, D
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2005, 1 (03) : 389 - 393