Speaker Diarization Exploiting the Eigengap Criterion and Cluster Ensembles

被引：15

作者：

Bassiou, Nikoletta ^{[1
]}

Moschou, Vassiliki ^{[1
]}

Kotropoulos, Constantine ^{[1
]}

机构：

[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 08期

关键词：

Broadcasts; cluster ensembles; eigengap criterion; movie scene analysis; speaker clustering; speaker diarization; two-person dialogues; SEGMENTATION; RECOGNITION;

D O I：

10.1109/TASL.2010.2042121

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

A novel system for speaker diarization is proposed that combines the eigengap criterion and cluster ensembles. No explicit assumptions on the number of speakers are made. Two variants of the system are developed. The first variant does not cluster the speech segments that are detected as outliers, while the second one does. The aforementioned system variants are assessed with respect to various metrics, such as the overall classification error, the average cluster purity, and the average speaker purity. Experiments are conducted on two-person dialogue scenes in movies as well as on news broadcasts from MDE RT-03 Training Data Speech Corpus released by the U. S. National Institute of Standards and Technology. In the latter case, the diarization error rate is also reported. It is demonstrated that the clustering performance does not degrade when outliers are present. Moreover, thanks to the eigengap criterion, the evaluation metrics are improved.

引用

页码：2134 / 2144

页数：11

共 50 条

[1] A robust speaker clustering algorithm
Ajmera, J
Wooters, C
[J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 411 - 416
[2] Ajmera J., 2002, ICSLP, P573
[3] Robust Detection of Phone Boundaries Using Model Selection Criteria With Few Observations
Almpanidis, George
Kotti, Margarita
Kotropoulos, Constantine
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (02): : 287 - 298
[4] [Anonymous], LNCS
[5] [Anonymous], P INT C SPOK LANG PR
[6] [Anonymous], 2000, Pattern Classification
[7] Multistage speaker diarization of broadcast news
Barras, Claude
Zhu, Xuan
Meignier, Sylvain
Gauvain, Jean-Luc
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1505 - 1512
[8] Boersma P., 1993, Institute of Phonetic Sciences, University of Amsterdam, Proceedings 17 (1993) 97-110, P97
[9] Speaker recognition: A tutorial
Campbell, JP
[J]. PROCEEDINGS OF THE IEEE, 1997, 85 (09) : 1437 - 1462
[10] Chen SS, 1998, INT CONF ACOUST SPEE, P645, DOI 10.1109/ICASSP.1998.675347

← 1 2 3 4 5 →