K-means and gaussian mixture modeling with a separation constraint

被引:0
作者
Jiang, He [1 ]
Arias-Castro, Ery [2 ]
机构
[1] Calif State Polytech Univ Pomona, Dept Math & Stat, Pomona, CA 91768 USA
[2] Univ Calif San Diego, Dept Math, La Jolla, CA USA
基金
美国国家科学基金会;
关键词
Clustering; Dynamic programming; Gaussian mixture models; K-means; Separation constraint; MAXIMUM-LIKELIHOOD-ESTIMATION; RESTRICTED EM ALGORITHM;
D O I
10.1080/03610918.2024.2354747
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of clustering with K-means and Gaussian mixture models with a constraint on the separation between the centers in the context of real-valued data. We first propose a dynamic programming approach to solving the K-means problem with a separation constraint on the centers, building on Wang and Song (2011). In the context of fitting a Gaussian mixture model, we then propose an EM algorithm that incorporates such a constraint. A separation constraint can help regularize the output of a clustering algorithm, and we provide both simulated and real data examples to illustrate this point.
引用
收藏
页数:15
相关论文
共 47 条
[1]  
[Anonymous], 2001, Data Table of Stature-for-age Charts
[2]  
[Anonymous], 2010, National youth physical activity and nutrition study (NYPANS)
[3]   Scalable clustering algorithms with balancing constraints [J].
Banerjee, Arindam ;
Ghosh, Joydeep .
DATA MINING AND KNOWLEDGE DISCOVERY, 2006, 13 (03) :365-395
[4]  
Basu S, 2004, SIAM PROC S, P333
[5]  
Basu S, 2009, CH CRC DATA MIN KNOW, P1
[6]  
Bennett KristinP., 2000, CONSTRAINED K MEANS
[7]  
Chauveau D., 2013, ECM and MM algorithms for Normal mixtures with constrained parameters
[8]  
Davidson I., 2007, ACM Trans Knowl Discov Data, V1, P1
[9]  
Davidson I, 2005, SIAM PROC S, P138
[10]   Testing for generalized linear mixed models with cluster correlated data under linear inequality constraints [J].
Davis, Karelyn A. ;
Park, Chul G. ;
Sinha, Sanjoy K. .
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2012, 40 (02) :243-258