Infinite Liouville mixture models with application to text and texture categorization

被引:37
作者
Bouguila, Nizar [1 ]
机构
[1] Concordia Univ, Fac Engn & Comp Sci, Concordia Inst Informat Syst Engn, Montreal, PQ H3G 2W1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Liouville family of distributions; Infinite mixture models; Proportional data; Nonparametric Bayesian inference; MCMC; Gibbs sampling; UNSUPERVISED SELECTION; DIRICHLET; CLASSIFICATION; DISTRIBUTIONS; ESTIMATORS;
D O I
10.1016/j.patrec.2011.09.037
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the problem of proportional data modeling and clustering using mixture models, a problem of great interest and of importance for many practical pattern recognition, image processing, data mining and computer vision applications. Finite mixture models are broadly applicable to clustering problems. But, they involve the challenging problem of the selection of the number of clusters which requires a certain trade-off. The number of clusters must be sufficient to provide the discriminating capability between clusters required for a given application. Indeed, if too many clusters are employed overfitting problems may occur and if few are used we have a problem of underfitting. Here we approach the problem of modeling and clustering proportional data using infinite mixtures which have been shown to be an efficient alternative to finite mixtures by overcoming the concern regarding the selection of the optimal number of mixture components. In particular, we propose and discuss the consideration of infinite Liouville mixture model whose parameter values are fitted to the data through a principled Bayesian algorithm that we have developed and which allows uncertainty in the number of mixture components. Our experimental evaluation involves two challenging applications namely text classification and texture discrimination, and suggests that the proposed approach can be an excellent choice for proportional data modeling. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:103 / 110
页数:8
相关论文
共 50 条
[21]   Markov chain Monte Carlo convergence diagnostics: A comparative review [J].
Cowles, MK ;
Carlin, BP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1996, 91 (434) :883-904
[22]  
Cox D. R., 1990, Stat. Sci, V5, P169, DOI [DOI 10.1214/SS/1177012165, DOI 10.1214/SS/1177012165)]
[23]  
Everitt B., 1993, Cluster analysis, Vthird
[24]  
Fang KT., 1990, Symmetric Multivariate and Related Distributions
[25]  
Fayyad UM, 1996, AI MAG, V17, P51
[26]   BAYESIAN ANALYSIS OF SOME NONPARAMETRIC PROBLEMS [J].
FERGUSON, TS .
ANNALS OF STATISTICS, 1973, 1 (02) :209-230
[27]   STOCHASTIC RELAXATION, GIBBS DISTRIBUTIONS, AND THE BAYESIAN RESTORATION OF IMAGES [J].
GEMAN, S ;
GEMAN, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1984, 6 (06) :721-741
[28]   Bayesian multiple comparisons using Dirichlet process priors [J].
Gopalan, R ;
Berry, DA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1998, 93 (443) :1130-1139
[29]   MULTIVARIATE LIOUVILLE DISTRIBUTIONS [J].
GUPTA, RD ;
RICHARDS, DS .
JOURNAL OF MULTIVARIATE ANALYSIS, 1987, 23 (02) :233-256
[30]  
Ishwaran H, 1998, ANN STAT, V26, P2157