Global and componentwise extrapolation for accelerating data mining from large incomplete data sets with the EM algorithm

被引:0
作者
Hsu, Chun-Nan [1 ]
Huang, Han-Shen [1 ]
Yang, Bo-Hou [1 ]
机构
[1] Acad Sinica, Inst Informat Sci, Taipei, Taiwan
来源
ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS | 2006年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The Expectation-Maximization (EM) algorithm is one of the most popular-algorithms for data mining from incomplete data. However when applied to large data sets with a large proportion of missing data, the EM algorithm may converge slowly.. The triple jump extrapolation method can effectively accelerate the EM algorithm by substantially reducing the number of iterations required for EM to converge. There are two options for the triple jump method, global extrapolation (TJEM) and componentwise extrapolation (CTJEM). We tiled these two methods for a variety of probabilistic models and found that in general, global extraplolation yields a better performance, but there are cases where componentwise extrapolation yields very high speedup. In this paper we investigate when componentwise extrapolation should be preferred. We conclude that, when the Jacobian of the EM mapping is diagonal or block diagonal, CTJEM should be preferred. We show how to determine whether a Jacobian is diagonal or block diagonal and experimentally confirm our claim. In particular we show that CTJEM is especially effective for the semi-supervised Bayesian classifier model given a highly sparse data set.
引用
收藏
页码:265 / +
页数:4
相关论文
共 16 条
[1]  
[Anonymous], 1997, WILEY SERIES PROBABI
[2]  
Bauer Eric., 1997, Proceedings of the 13th Conference on Uncertainty in Artifical Intelligence, P3
[3]  
Burden R. L., 1988, NUMERICAL ANAL, V5
[4]  
COOPER GF, 1992, MACH LEARN, V9, P309, DOI 10.1007/BF00994110
[5]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[6]  
HESTERBERG T, 2005, P STAT COMP SECT AM
[7]  
Huang HS, 2005, Fifth IEEE International Conference on Data Mining, Proceedings, P649
[8]  
HUANG HS, UNPUB TRIPLE JUMP AI
[9]   Acceleration of the EM algorithm by using quasi-Newton methods [J].
Jamshidian, M ;
Jennrich, RI .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1997, 59 (03) :569-587
[10]  
LOUIS TA, 1982, J ROY STAT SOC B MET, V44, P226