Probabilistic Speaker Diarization With Bag-of-Words Representations of Speaker Angle Information

被引:14
|
作者
Ishiguro, Katsuhiko [1 ]
Yamada, Takeshi
Araki, Shoko [1 ]
Nakatani, Tomohiro [1 ]
Sawada, Hiroshi [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2012年 / 20卷 / 02期
关键词
Bag-of-words (BOW); clustering; direction of arrival (DOA); latent Dirichlet allocation (LDA); speaker diarization; microphone arrays; variational Bayes inference; LECTURE;
D O I
10.1109/TASL.2011.2151858
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker diarization determines "who spoke when" from the recorded conversations of an unknown number of people. In general, we have no a priori information about the number, the locations, or even the characteristics of the speakers. Additionally, speakers' speech utterances vary dynamically because of turn-taking during the conversations. These conditions make the speaker-clustering task extremely difficult. The problem becomes even harder if online (incremental) processing is required. In this paper, we formulate the speaker-clustering problem as the clustering of the sequential audio features generated by an unknown number of latent mixture components (speakers). We employ a probabilistic model that assumes time-sensitive speaker mixtures at every time frame, which, surprisingly, suits the diarization scenario. We combine the time-varying probabilistic model with direction of arrival (DOA) information calculated from a microphone array in a bag-of-words (BoW)-style feature representation. The proposed system effectively estimates the number and locations of the speakers in an online manner based on the standard Bayes inference scheme. Experiments confirm that the proposed model can successfully infer the number and features of speakers and yield better or comparable speaker diarization results compared with conventional methods in several datasets.
引用
收藏
页码:447 / 460
页数:14
相关论文
共 50 条
  • [1] Speaker Diarization with Lexical Information
    Park, Tae Jin
    Han, Kyu J.
    Huang, Jing
    He, Xiaodong
    Zhou, Bowen
    Georgiou, Panayiotis
    Narayanan, Shrikanth
    INTERSPEECH 2019, 2019, : 391 - 395
  • [2] SPEAKER CHANGE DETECTION AND SPEAKER DIARIZATION USING SPATIAL INFORMATION
    Hu, Mathieu
    Sharma, Dushyant
    Doclo, Simon
    Brookes, Mike
    Naylor, Patrick A.
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5743 - 5747
  • [3] Speaker Diarization and Detection System using A Priori Speaker Information
    Kenai, Ouassila
    Asbai, Nassim
    Ouamour, Siham
    Guerti, Mhania
    Djeghiour, Salim
    2018 2ND INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING (ICNLSP), 2018, : 73 - 78
  • [4] Efficient use of overlap information in speaker diarization
    Otterson, Scott
    Ostendorf, Mari
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 683 - 686
  • [5] Speaker Diarization Using a priori Acoustic Information
    Aronowitz, Hagai
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 944 - 947
  • [6] Leveraging speaker attribute information using multi task learning for speaker verification and diarization
    Luu, Chau
    Bell, Peter
    Renals, Steve
    INTERSPEECH 2021, 2021, : 491 - 495
  • [7] Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization
    Cheng, Luyao
    Zheng, Siqi
    Zhang Qinglin
    Wang, Hui
    Chen, Yafeng
    Chen, Qian
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 14068 - 14077
  • [8] Fusing Audio and Video Information for Online Speaker Diarization
    Schmalenstroeer, Joerg
    Kelling, Martin
    Leutnant, Volker
    Haeb-Umbach, Reinhold
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1159 - 1162
  • [9] Agglomerative Information Bottleneck for speaker diarization of meetings data
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 250 - 255
  • [10] An Information Theoretic Approach to Speaker Diarization of Meeting Data
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (07): : 1382 - 1393