Infinite Liouville mixture models with application to text and texture categorization

被引:37
作者
Bouguila, Nizar [1 ]
机构
[1] Concordia Univ, Fac Engn & Comp Sci, Concordia Inst Informat Syst Engn, Montreal, PQ H3G 2W1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Liouville family of distributions; Infinite mixture models; Proportional data; Nonparametric Bayesian inference; MCMC; Gibbs sampling; UNSUPERVISED SELECTION; DIRICHLET; CLASSIFICATION; DISTRIBUTIONS; ESTIMATORS;
D O I
10.1016/j.patrec.2011.09.037
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the problem of proportional data modeling and clustering using mixture models, a problem of great interest and of importance for many practical pattern recognition, image processing, data mining and computer vision applications. Finite mixture models are broadly applicable to clustering problems. But, they involve the challenging problem of the selection of the number of clusters which requires a certain trade-off. The number of clusters must be sufficient to provide the discriminating capability between clusters required for a given application. Indeed, if too many clusters are employed overfitting problems may occur and if few are used we have a problem of underfitting. Here we approach the problem of modeling and clustering proportional data using infinite mixtures which have been shown to be an efficient alternative to finite mixtures by overcoming the concern regarding the selection of the optimal number of mixture components. In particular, we propose and discuss the consideration of infinite Liouville mixture model whose parameter values are fitted to the data through a principled Bayesian algorithm that we have developed and which allows uncertainty in the number of mixture components. Our experimental evaluation involves two challenging applications namely text classification and texture discrimination, and suggests that the proposed approach can be an excellent choice for proportional data modeling. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:103 / 110
页数:8
相关论文
共 50 条
[1]  
Anderson T.W., 1986, STAT ANAL DATA, V2nd, DOI DOI 10.1007/978-94-009-4109-0
[2]  
[Anonymous], P 8 EUR C COMP VIS P
[3]  
[Anonymous], 2003, BAYESIAN NONPARAMETR
[4]  
[Anonymous], P IEEE C COMP VIS PA
[5]  
[Anonymous], 2000, WILEY SERIES PROBABI
[6]  
[Anonymous], P ANN INT ACM SIGIR
[7]  
[Anonymous], P INT C INF KNOWL MA
[8]  
[Anonymous], ACM SIGIR C RES DEV
[9]  
[Anonymous], APPL STAT
[10]   MIXTURES OF DIRICHLET PROCESSES WITH APPLICATIONS TO BAYESIAN NONPARAMETRIC PROBLEMS [J].
ANTONIAK, CE .
ANNALS OF STATISTICS, 1974, 2 (06) :1152-1174