High-dimensional count data clustering based on an exponential approximation to the multinomial Beta-Liouville distribution

被引：11

作者：

Zamzami, Nuha ^{[1
,2
]}

Bouguila, Nizar ^{[1
]}

机构：

[1] Concordia Univ, Concordia Inst Informat Syst Engn CIISE, Montreal, PQ, Canada

[2] Univ Jeddah, Coll Comp Sci & Engn, Jeddah, Saudi Arabia

来源：

INFORMATION SCIENCES | 2020年 / 524卷

关键词：

Exponential family; Finite mixtures; Model selection; Count data; CEM; Probabilistic kernels; SHAPES;

D O I：

10.1016/j.ins.2020.03.028

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we propose a mixture model for high-dimensional count data clustering based on an exponential-family approximation of the Multinomial Beta-Liouville distribution, which we call EMBL. We deal simultaneously with the problems of fitting the model to observed data and selecting the number of components. The learning algorithm automatically selects the optimal number of components and avoids several drawbacks of the standard Expectation-Maximization algorithm, including the sensitivity to initialization and possible convergence to the boundary of the parameter space. We demonstrate the effectiveness and robustness of the proposed clustering approach through a set of extensive empirical experiments that involve challenging real-world applications. The results reveal that the novel proposed model strives to achieve higher accuracy compared to the state-of-the-art generative models for count data clustering. Furthermore, the superior performance of EMBL demonstrates its flexibility and ability to address the burstiness phenomenon successfully, as well as shows its computational efficiency, especially when dealing with sparse high-dimensional vectors. (C) 2020 Elsevier Inc. All rights reserved.

引用

页码：116 / 135

页数：20

共 50 条

[1] [Anonymous], NIPS
[2] [Anonymous], 2001, BAYESIAN THEORY
[3] [Anonymous], TECH REP
[4] [Anonymous], WILEY
[5] Finding overlapping components with MML
Baxter, RA
Oliver, JJ
[J]. STATISTICS AND COMPUTING, 2000, 10 (01) : 5 - 16
[6] Shape matching and object recognition using shape contexts
Belongie, S
Malik, J
Puzicha, J
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (04) : 509 - 522
[7] Borgi MA, 2014, IEEE IMAGE PROC, P5277, DOI 10.1109/ICIP.2014.7026068
[8] Clustering of count data using generalized Dirichlet multinomial distributions
Bouguila, Nizar
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2008, 20 (04) : 462 - 474
[9] Count Data Modeling and Classification Using Finite Mixtures of Distributions
Bouguila, Nizar
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS, 2011, 22 (02): : 186 - 198
[10] Carcillo F, 2019, INFORM SCI

← 1 2 3 4 5 →