Speaker Diarization Exploiting the Eigengap Criterion and Cluster Ensembles

被引:15
作者
Bassiou, Nikoletta [1 ]
Moschou, Vassiliki [1 ]
Kotropoulos, Constantine [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 08期
关键词
Broadcasts; cluster ensembles; eigengap criterion; movie scene analysis; speaker clustering; speaker diarization; two-person dialogues; SEGMENTATION; RECOGNITION;
D O I
10.1109/TASL.2010.2042121
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A novel system for speaker diarization is proposed that combines the eigengap criterion and cluster ensembles. No explicit assumptions on the number of speakers are made. Two variants of the system are developed. The first variant does not cluster the speech segments that are detected as outliers, while the second one does. The aforementioned system variants are assessed with respect to various metrics, such as the overall classification error, the average cluster purity, and the average speaker purity. Experiments are conducted on two-person dialogue scenes in movies as well as on news broadcasts from MDE RT-03 Training Data Speech Corpus released by the U. S. National Institute of Standards and Technology. In the latter case, the diarization error rate is also reported. It is demonstrated that the clustering performance does not degrade when outliers are present. Moreover, thanks to the eigengap criterion, the evaluation metrics are improved.
引用
收藏
页码:2134 / 2144
页数:11
相关论文
共 50 条
  • [1] A robust speaker clustering algorithm
    Ajmera, J
    Wooters, C
    [J]. ASRU'03: 2003 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING ASRU '03, 2003, : 411 - 416
  • [2] Ajmera J., 2002, ICSLP, P573
  • [3] Robust Detection of Phone Boundaries Using Model Selection Criteria With Few Observations
    Almpanidis, George
    Kotti, Margarita
    Kotropoulos, Constantine
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (02): : 287 - 298
  • [4] [Anonymous], LNCS
  • [5] [Anonymous], P INT C SPOK LANG PR
  • [6] [Anonymous], 2000, Pattern Classification
  • [7] Multistage speaker diarization of broadcast news
    Barras, Claude
    Zhu, Xuan
    Meignier, Sylvain
    Gauvain, Jean-Luc
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1505 - 1512
  • [8] Boersma P., 1993, Institute of Phonetic Sciences, University of Amsterdam, Proceedings 17 (1993) 97-110, P97
  • [9] Speaker recognition: A tutorial
    Campbell, JP
    [J]. PROCEEDINGS OF THE IEEE, 1997, 85 (09) : 1437 - 1462
  • [10] Chen SS, 1998, INT CONF ACOUST SPEE, P645, DOI 10.1109/ICASSP.1998.675347