On Clustering Histograms with k-Means by Using Mixed α-Divergences

被引:21
作者
Nielsen, Frank [1 ,2 ]
Nock, Richard [3 ,4 ]
Amari, Shun-ichi [5 ]
机构
[1] Sony Comp Sci Labs Inc, Tokyo 1410022, Japan
[2] Ecole Polytech, F-91128 Palaiseau, France
[3] NICTA, Alexandria, NSW 1435, Australia
[4] Australian Natl Univ, Alexandria, NSW 1435, Australia
[5] RIKEN, Brain Sci Inst, Wako, Saitama 3510198, Japan
基金
澳大利亚研究理事会;
关键词
bag-of-X; alpha-divergence; Jeffreys divergence; centroid; k-means clustering; k-means seeding; APPROXIMATION; DISTANCE;
D O I
10.3390/e16063273
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Clustering sets of histograms has become popular thanks to the success of the generic method of bag-of-X used in text categorization and in visual categorization applications. In this paper, we investigate the use of a parametric family of distortion measures, called the alpha-divergences, for clustering histograms. Since it usually makes sense to deal with symmetric divergences in information retrieval systems, we symmetrize the alpha-divergences using the concept of mixed divergences. First, we present a novel extension of k-means clustering to mixed divergences. Second, we extend the k-means++ seeding to mixed alpha-divergences and report a guaranteed probabilistic bound. Finally, we describe a soft clustering technique for mixed alpha-divergences.
引用
收藏
页码:3273 / 3301
页数:29
相关论文
共 42 条
  • [1] ALI SM, 1966, J ROY STAT SOC B, V28, P131
  • [2] Amari S.I., 2013, MATH SCI SUURIKAGAKU, P65
  • [3] Integration of stochastic models by minimizing α-divergence
    Amari, Shun-ichi
    [J]. NEURAL COMPUTATION, 2007, 19 (10) : 2780 - 2796
  • [4] α-Divergence Is Unique, Belonging to Both f-Divergence and Bregman Divergence Classes
    Amari, Shun-Ichi
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2009, 55 (11) : 4925 - 4931
  • [5] [Anonymous], 2007, P 18 ANN ACM SIAM S
  • [6] [Anonymous], 1957, RR5497 BELL LAB
  • [7] Baker L. D., 1998, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P96, DOI 10.1145/290941.290970
  • [8] Banerjee A, 2005, J MACH LEARN RES, V6, P1705
  • [9] REAL VALUES OF THE W-FUNCTION
    BARRY, DA
    CULLIGANHENSLEY, PJ
    BARRY, SJ
    [J]. ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1995, 21 (02): : 161 - 171
  • [10] ENTROPIC MEANS
    BENTAL, A
    CHARNES, A
    TEBOULLE, M
    [J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1989, 139 (02) : 537 - 551