Uncertainty Quantification in Graph-Based Classification of High Dimensional Data

被引：48

作者：

Bertozzi, Andrea L. ^{[1
]}

Luo, Xiyang ^{[1
]}

Stuart, Andrew M. ^{[2
]}

Zygalakis, Konstantinos C. ^{[3
]}

机构：

[1] Univ Calif Los Angeles, Dept Math, Los Angeles, CA 90024 USA

[2] CALTECH, Dept Comp & Math Sci, Pasadena, CA 91125 USA

[3] Univ Edinburgh, Sch Math, Edinburgh, Midlothian, Scotland

来源：

SIAM-ASA JOURNAL ON UNCERTAINTY QUANTIFICATION | 2018年 / 6卷 / 02期

基金：

英国工程与自然科学研究理事会; 美国国家科学基金会;

关键词：

graph classification; uncertainty quantification; Gaussian prior; DIFFUSE INTERFACE MODELS; MCMC METHODS; CONVERGENCE; REGULARIZATION; ALGORITHMS;

D O I：

10.1137/17M1134214

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Classification of high dimensional data finds wide-ranging applications. In many of these applications equipping the resulting classification with a measure of uncertainty may be as important as the classification itself. In this paper we introduce, develop algorithms for, and investigate the properties of a variety of Bayesian models for the task of binary classification; via the posterior distribution on the classification labels, these methods automatically give measures of uncertainty. The methods are all based on the graph formulation of semisupervised learning. We provide a unified framework which brings together a variety of methods that have been introduced in different communities within the mathematical sciences. We study probit classification [C. K. Williams and C. E. Rasmussen, "Gaussian Processes for Regression," in Advances in Neural Information Processing Systems 8, MIT Press, 1996, pp. 514-520] in the graph-based setting, generalize the level-set method for Bayesian inverse problems [M. A. Iglesias, Y. Lu, and A. M. Stuart, Interfaces Free Bound., 18 (2016), pp. 181-217] to the classification setting, and generalize the Ginzburg-Landau optimization-based classifier [A. L. Bertozzi and A. Flenner, Multiscale Model. Simul., 10 (2012), pp. 1090-1118], [Y. Van Gennip and A. L. Bertozzi, Adv. Differential Equations, 17 (2012), pp. 1115-1180] to a Bayesian setting. We also show that the probit and level-set approaches are natural relaxations of the harmonic function approach introduced in [X. Zhu et al., "Semi-supervised Learning Using Gaussian Fields and Harmonic Functions," in ICML, Vol. 3, 2003, pp. 912-919]. We introduce efficient numerical methods, suited to large datasets, for both MCMC-based sampling and gradient-based MAP estimation. Through numerical experiments we study classification accuracy and uncertainty quantification for our models; these experiments showcase a suite of datasets commonly used to evaluate graph-based semisupervised learning algorithms.

引用

页码：568 / 595

页数：28

共 56 条

[1]

[Anonymous], 2 U STRATHCL

[2]

[Anonymous], PREPRINT

[3]

[Anonymous], COMPUTATIONAL UNPUB

[4]

[Anonymous], 2003, P 20 INT C MACH LEAR

[5]

[Anonymous], PREPRINT

[6]

[Anonymous], 2007, INFORM SCI STAT

[7]

[Anonymous], 2000, Advances in Neural Information Processing Systems

[8]

[Anonymous], TR1530 U WISC

[9]

[Anonymous], 2003, SCH CS CMU CIT

[10]

[Anonymous], 2011, P INT C SAMPL THEOR

← 1 2 3 4 5 6 →