Model complexity selection and cross-validation em training for robust speaker diarization

被引:0
作者
Anguera, Xavier [1 ,2 ]
Shinozaki, Takahiro [3 ,4 ]
Wooters, Chuck
Hernando, Javier [2 ]
机构
[1] Int Comp Sci Inst, Berkeley, CA 94704 USA
[2] Tech Univ Catalonia UPC, Barcelona 08034, Spain
[3] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA
[4] Kyoto Univ, Kyoto 6068501, Japan
来源
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 | 2007年
关键词
speaker diarization; speaker segmentation and clustering; complexity selection; cross-validation EM training;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Accurate modeling of speaker clusters is important in the task of speaker diarization. Creating accurate models involves both selection of the model complexity and optimum training given the data. Using models with fixed complexity and trained using the standard EM algorithm poses a risk of overfitting, which can lead to a reduction in diarization performance. In this paper a technique proposed by the author to estimate the complexity of a model is combined with a novel training algorithm called "Cross-Validation EM" to control the number of training iterations. This combination leads to more robust speaker modeling and results in an increase in speaker diarization performance. Tests on the NIST RT (MDM) datasets for meetings show a relative improvement of 10.6% relative on the test set.
引用
收藏
页码:273 / +
页数:2
相关论文
共 10 条
[1]  
AJMERA J, 2003, P ASRU US VIRG ISL U
[2]  
ANGUERA X, 2006, MLMI 06 WASH DC USA
[3]  
ANGUERA X, 2006, RT06S M REC EV WASH
[4]  
[Anonymous], P EUR 1997
[5]  
Chen SS, 1998, INT CONF ACOUST SPEE, P645, DOI 10.1109/ICASSP.1998.675347
[6]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[7]  
Friedman J, 2001, The elements of statistical learning, V1, DOI DOI 10.1007/978-0-387-21606-5
[8]  
SHINOZAKI T, 2007, UNPUB P ICASSP
[9]  
Young S., 2005, HTK BOOK
[10]  
ZHU X, 2005, P ICSLP LISB PORT SE