A quasi-Bayesian perspective to online clustering

被引:9
|
作者
Li, Le [1 ,2 ]
Guedj, Benjamin [3 ]
Loustau, Sebastien [4 ]
机构
[1] Univ Angers, Angers, France
[2] iAdvize, Nantes, France
[3] INRIA, Modal Project Team, Lille Nord Europe Res Ctr, Villers Les Nancy, France
[4] LumenAI, Paris, France
来源
ELECTRONIC JOURNAL OF STATISTICS | 2018年 / 12卷 / 02期
关键词
Online clustering; quasi-Bayesian learning; mini-max regret bounds; reversible jump Markov chain Monte Carlo; LOSS BOUNDS; DATA SET; NUMBER; PREDICTION; GRADIENT; AGGREGATION; MODEL;
D O I
10.1214/18-EJS1479
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
When faced with high frequency streams of data, clustering raises theoretical and algorithmic pitfalls. We introduce a new and adaptive online clustering algorithm relying on a quasi-Bayesian approach, with a dynamic (i.e,time-dependent) estimation of the (unknown and changing) number of clusters. We prove that our approach is supported by minimax regret bounds. We also provide an RJMCMC-flavored implementation (called PACBO, see https://cran.r-project.org/web/packages/PACBO/index.html) for which we give a convergence guarantee. Finally, numerical experiments illustrate the potential of our procedure.
引用
收藏
页码:3071 / 3113
页数:43
相关论文
共 50 条