Automatic speaker clustering using a voice characteristic reference space and maximum purity estimation

被引:13
作者
Tsai, Wei-Ho [1 ]
Cheng, Shih-Sian
Wang, Hsin-Min
机构
[1] Natl Taipei Univ Technol, Dept Elect Engn, Taipei 10608, Taiwan
[2] Acad Sinica, Inst Sci Informat, Taipei 115, Taiwan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 04期
关键词
genetic algorithm; maximum purity estimation; speaker clustering;
D O I
10.1109/TASL.2007.894525
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper investigates the problem of automatically grouping unknown speech utterances based on their associated speakers. In attempts to determine which utterances should be grouped together, it is necessary to measure the voice similarities between utterances. Since most existing methods measure the inter-utterance similarities based directly on the spectrum-based features, the resulting clusters may not be well-related to speakers, but to various acoustic classes instead. This study remedies this shortcoming by projecting utterances onto a reference space trained to cover the generic voice characteristics underlying the whole utterance collection. The resultant projection vectors naturally reflect the relationships of voice similarities among all the utterances, and hence are more robust against interference from nonspeaker factors. Then, a clustering method based on maximum purity estimation is proposed, with the aim of maximizing the similarities between utterances within all the clusters. This method employs a genetic algorithm to determine the cluster to which each utterance should be assigned, which overcomes the limitation of conventional hierarchical clustering that the final result can only reach the local optimum. In addition, the proposed clustering method adapts a Bayesian information criterion to determine how many clusters should be created.
引用
收藏
页码:1461 / 1474
页数:14
相关论文
共 34 条
[1]  
[Anonymous], 1997, Proceedings of the uropean Conference on Speech Communication and Technology
[2]  
[Anonymous], 2002, P ICSLP 2002
[3]  
Baker J. E., 1985, Proceedings of the International Conference on Genetic Algorithms and their Applications, P101
[4]   Speaker recognition: A tutorial [J].
Campbell, JP .
PROCEEDINGS OF THE IEEE, 1997, 85 (09) :1437-1462
[5]  
Chen SS, 1998, INT CONF ACOUST SPEE, P645, DOI 10.1109/ICASSP.1998.675347
[6]  
Faltlhauser R, 2001, ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, P57, DOI 10.1109/ASRU.2001.1034588
[7]  
FURUI S, 1989, P IEEE INT C AC SPEE, P286
[8]  
GISH H, 1991, INT CONF ACOUST SPEE, P873, DOI 10.1109/ICASSP.1991.150477
[9]  
GOLDBERG DE, 1989, GENETIC ALGORITHM SE
[10]   COMPARING PARTITIONS [J].
HUBERT, L ;
ARABIE, P .
JOURNAL OF CLASSIFICATION, 1985, 2 (2-3) :193-218