Uncertainty Quantification in Graph-Based Classification of High Dimensional Data

被引:48
作者
Bertozzi, Andrea L. [1 ]
Luo, Xiyang [1 ]
Stuart, Andrew M. [2 ]
Zygalakis, Konstantinos C. [3 ]
机构
[1] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90024 USA
[2] CALTECH, Dept Comp & Math Sci, Pasadena, CA 91125 USA
[3] Univ Edinburgh, Sch Math, Edinburgh, Midlothian, Scotland
基金
英国工程与自然科学研究理事会; 美国国家科学基金会;
关键词
graph classification; uncertainty quantification; Gaussian prior; DIFFUSE INTERFACE MODELS; MCMC METHODS; CONVERGENCE; REGULARIZATION; ALGORITHMS;
D O I
10.1137/17M1134214
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty. The methods are all based on the graph formulation of semisupervised learning. We provide a unified framework which brings together a variety of methods that have been introduced in different communities within the mathematical sciences. We study probit classification [C. K. Williams and C. E. Rasmussen, "Gaussian Processes for Regression," in Advances in Neural Information Processing Systems 8, MIT Press, 1996, pp. 514-520] in the graph-based setting, generalize the level-set method for Bayesian inverse problems [M. A. Iglesias, Y. Lu, and A. M. Stuart, Interfaces Free Bound., 18 (2016), pp. 181-217] to the classification setting, and generalize the Ginzburg-Landau optimization-based classifier [A. L. Bertozzi and A. Flenner, Multiscale Model. Simul., 10 (2012), pp. 1090-1118], [Y. Van Gennip and A. L. Bertozzi, Adv. Differential Equations, 17 (2012), pp. 1115-1180] to a Bayesian setting. We also show that the probit and level-set approaches are natural relaxations of the harmonic function approach introduced in [X. Zhu et al., "Semi-supervised Learning Using Gaussian Fields and Harmonic Functions," in ICML, Vol. 3, 2003, pp. 912-919]. We introduce efficient numerical methods, suited to large datasets, for both MCMC-based sampling and gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semisupervised learning algorithms.
引用
收藏
页码:568 / 595
页数:28
相关论文
共 56 条
[1]  
[Anonymous], 2 U STRATHCL
[2]  
[Anonymous], PREPRINT
[3]  
[Anonymous], COMPUTATIONAL UNPUB
[4]  
[Anonymous], 2003, P 20 INT C MACH LEAR
[5]  
[Anonymous], PREPRINT
[6]  
[Anonymous], 2007, INFORM SCI STAT
[7]  
[Anonymous], 2000, Advances in Neural Information Processing Systems
[8]  
[Anonymous], TR1530 U WISC
[9]  
[Anonymous], 2003, SCH CS CMU CIT
[10]  
[Anonymous], 2011, P INT C SAMPL THEOR