A distributed EM algorithm to estimate the parameters of a finite mixture of components

被引:19
作者
Safarinejadian, Behrooz [1 ]
Menhaj, Mohammad B. [1 ]
Karrari, Mehdi [1 ]
机构
[1] Amirkabir Univ Technol, Dept Elect Engn, Tehran, Iran
关键词
Distributed data mining; EM algorithm; Data clustering; Density estimation; GENE-EXPRESSION DATA; ACCELERATING EM;
D O I
10.1007/s10115-009-0218-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a distributed expectation maximization (DEM) algorithm is first introduced in a general form for estimating the parameters of a finite mixture of components. This algorithm is used for density estimation and clustering of data distributed over nodes of a network. Then, a distributed incremental EM algorithm (DIEM) with a higher convergence rate is proposed. After a full derivation of distributed EM algorithms, convergence of these algorithms is analyzed based on the negative free energy concept used in statistical physics. An analytical approach is also developed for evaluating the convergence rate of both incremental and distributed incremental EM algorithms. It is analytically shown that the convergence rate of DIEM is much faster than that of the DEM algorithm. Finally, simulation results approve that DIEM remarkably outperforms DEM for both synthetic and real data sets.
引用
收藏
页码:267 / 292
页数:26
相关论文
共 36 条
[1]  
[Anonymous], 1999, Learning in Graphical Models
[2]   Clustering multidimensional sequences in spatial and temporal databases [J].
Assent, Ira ;
Krieger, Ralph ;
Glavic, Boris ;
Seidl, Thomas .
KNOWLEDGE AND INFORMATION SYSTEMS, 2008, 16 (01) :29-51
[3]  
BESAG J, 1986, J R STAT SOC B, V48, P259
[4]   Multi-step density-based clustering [J].
Brecheisen, S ;
Kriegel, HP ;
Pfeifle, M .
KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (03) :284-308
[5]   Collective mining of Bayesian networks from distributed heterogeneous data [J].
Chen, R ;
Sivakumar, K ;
Kargupta, H .
KNOWLEDGE AND INFORMATION SYSTEMS, 2004, 6 (02) :164-187
[6]  
Dasgupta S., 1999, Proceedings of the 40th Annual Symposium on Foundations of Computer Science, FOCS'99, page, V40, P634
[7]   Distributed data mining in peer-to-peer networks [J].
Datta, Souptik ;
Bhaduri, Kanishka ;
Giannella, Chris ;
Kargupta, Hillol ;
Wolff, Ran .
IEEE INTERNET COMPUTING, 2006, 10 (04) :18-26
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]  
DUTTA S, 2005, 8 INT WORKSH HIGH PE
[10]   Unsupervised learning of finite mixture models [J].
Figueiredo, MAT ;
Jain, AK .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (03) :381-396