BIC-Based Speaker Segmentation Using Divide-and-Conquer Strategies With Application to Speaker Diarization

被引:22
作者
Cheng, Shih-Sian [1 ,2 ]
Wang, Hsin-Min [2 ]
Fu, Hsin-Chia [1 ]
机构
[1] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu 300, Taiwan
[2] Acad Sinica, Inst Informat Sci, Taipei 115, Taiwan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2010年 / 18卷 / 01期
关键词
Bayesian information criterion (BIC); divide-and-conquer; speaker change detection; speaker diarization; speaker segmentation; AUDIO CLASSIFICATION; BROADCAST NEWS;
D O I
10.1109/TASL.2009.2024730
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose three divide-and-conquer approaches for Bayesian information criterion (BIC)-based speaker segmentation. The approaches detect speaker changes by recursively partitioning a large analysis window into two sub-windows and recursively verifying the merging of two adjacent audio segments using Delta BIC, a widely-adopted distance measure of two audio segments. We compare our approaches to three popular distance-based approaches, namely, Chen and Gopalakrishnan's window-growing-based approach, Siegler et al.'s fixed-size sliding window approach, and Delacourt and Wellekens's DISTBIC approach, by performing computational cost analysis and conducting speaker change detection experiments on two broadcast news data sets. The results show that the proposed approaches are more efficient and achieve higher segmentation accuracy than the compared distance-based approaches. In addition, we apply the segmentation approaches discussed in this paper to the speaker diarization task. The experiment results show that a more effective segmentation approach leads to better diarization accuracy.
引用
收藏
页码:141 / 157
页数:17
相关论文
共 39 条
  • [1] ANGUERA X, 2005, TR05008 ICSI BERK U
  • [2] [Anonymous], 1998, Proc. DARPA Broadcast News Transcription and Understanding Workshop
  • [3] Bakis Raimo., 1997, Proceedings of the Speech Recognition Workshop, P67
  • [4] Multistage speaker diarization of broadcast news
    Barras, Claude
    Zhu, Xuan
    Meignier, Sylvain
    Gauvain, Jean-Luc
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1505 - 1512
  • [5] Bonastre JF, 2000, INT CONF ACOUST SPEE, P1177
  • [6] Evaluation of BIC-based algorithms for audio segmentation
    Cettolo, M
    Vescovi, M
    Rizzi, R
    [J]. COMPUTER SPEECH AND LANGUAGE, 2005, 19 (02) : 147 - 170
  • [7] Discrimination power of vocal source and vocal tract related features for speaker segmentation
    Chan, Wai Nang
    Zheng, Nengheng
    Lee, Tan
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (06): : 1884 - 1892
  • [8] CHENG S, 2003, P 8 EUR C SPEECH COM, P945
  • [9] DISTBIC: A speaker-based segmentation for audio data indexing
    Delacourt, P
    Wellekens, CJ
    [J]. SPEECH COMMUNICATION, 2000, 32 (1-2) : 111 - 126
  • [10] How many clusters? Which clustering method? Answers via model-based cluster analysis
    Fraley, C
    Raftery, AE
    [J]. COMPUTER JOURNAL, 1998, 41 (08) : 578 - 588