Border Sampling Through Coupling Markov Chain Monte Carlo

被引:1
|
作者
Li, Guichong [1 ]
Japkowicz, Nathalie [1 ]
Stocki, Trevor J. [2 ]
Ungar, R. Kurt [2 ]
机构
[1] Comp Sci Univ Ottawa, Ottawa, ON, Canada
[2] Health Canada, Radiat Protect Bureau, Ottawa, ON, Canada
来源
ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2008年
关键词
D O I
10.1109/ICDM.2008.52
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, Progressive Border Sampling (PBS) was proposed for sample selection in supervised learning by progressively learning an augmented full border from small labeled datasets. However, this quadratic learning algorithm is inapplicable to large datasets. In this paper, we incorporate the PBS to a state of the art technique called Coupling Markov Chain Monte Carlo (CMCMC) in an attempt to scale the original algorithm up, on large labeled datasets. The CMCMC can produce an exact sample while a naive strategy for Markov Chain Monte Carlo cannot guarantee the convergence to a stationary distribution. The resulting CMCMC PBS algorithm is thus proposed for border sampling on large datasets. CMCMC-PBS exhibits several remarkable characteristics: linear time complexity, learner-independence, and a consistent convergence to an optimal sample from the original training sets by learning from their subsamples. Our experimental results on the 33 either small or large labeled datasets from the UCIKDD repository and a nuclear security application show that our new approach outperforms many previous sampling techniques for sample selection.
引用
收藏
页码:393 / +
页数:3
相关论文
共 50 条
  • [1] Optimal Markov chain Monte Carlo sampling
    Chen, Ting-Li
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2013, 5 (05) : 341 - 348
  • [2] Markov Chain Monte Carlo sampling on multilocus genotypes
    Szydlowski, M.
    JOURNAL OF ANIMAL AND FEED SCIENCES, 2006, 15 (04): : 685 - 694
  • [3] A simple introduction to Markov Chain Monte–Carlo sampling
    Don van Ravenzwaaij
    Pete Cassey
    Scott D. Brown
    Psychonomic Bulletin & Review, 2018, 25 : 143 - 154
  • [4] Coupling control variates for Markov chain Monte Carlo
    Goodman, Jonathan B.
    Lin, Kevin K.
    JOURNAL OF COMPUTATIONAL PHYSICS, 2009, 228 (19) : 7127 - 7136
  • [5] Markov Chain Monte Carlo posterior sampling with the Hamiltonian method
    Hanson, KM
    MEDICAL IMAGING: 2001: IMAGE PROCESSING, PTS 1-3, 2001, 4322 : 456 - 467
  • [6] Respondent-driven sampling as Markov chain Monte Carlo
    Goel, Sharad
    Salganik, Matthew J.
    STATISTICS IN MEDICINE, 2009, 28 (17) : 2202 - 2229
  • [7] Markov chain Monte Carlo sampling using a reservoir method
    Wang, Zhonglei
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2019, 139 : 64 - 74
  • [8] ON THE OPTIMAL TRANSITION MATRIX FOR MARKOV CHAIN MONTE CARLO SAMPLING
    Chen, Ting-Li
    Chen, Wei-Kuo
    Hwang, Chii-Ruey
    Pai, Hui-Ming
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2012, 50 (05) : 2743 - 2762
  • [9] A simple introduction to Markov Chain Monte-Carlo sampling
    van Ravenzwaaij, Don
    Cassey, Pete
    Brown, Scott D.
    PSYCHONOMIC BULLETIN & REVIEW, 2018, 25 (01) : 143 - 154
  • [10] Accelerating Markov Chain Monte Carlo sampling with diffusion models ☆
    Hunt-Smith, N. T.
    Melnitchouk, W.
    Ringer, F.
    Sato, N.
    Thomas, A. W.
    White, M. J.
    COMPUTER PHYSICS COMMUNICATIONS, 2024, 296