Speaker Diarization with Lexical Information

被引:11
|
作者
Park, Tae Jin [1 ]
Han, Kyu J. [2 ]
Huang, Jing [2 ]
He, Xiaodong [2 ]
Zhou, Bowen [2 ]
Georgiou, Panayiotis [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ Southern Calif, Los Angeles, CA 90089 USA
[2] JD AI Res, Beijing, Peoples R China
来源
关键词
speaker diarization; automatic speech recognition; lexical information; adjacency matrix integration; spectral clustering; SPEECH; TRANSCRIPTION; SEGMENTATION;
D O I
10.21437/Interspeech.2019-1947
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate lexical and acoustic information in a comprehensive way during clustering, we introduce an adjacency matrix integration for spectral clustering. Since words and word boundary information for word-level speaker turn probability estimation are provided by a speech recognition system, our proposed method works without any human intervention for manual transcriptions. We show that the proposed method improves diarization performance on various evaluation datasets compared to the baseline diarization system using acoustic information only in speaker embeddings.
引用
收藏
页码:391 / 395
页数:5
相关论文
共 50 条
  • [1] Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction
    Paturi, Rohit
    Srinivasan, Sundararajan
    Li, Xiang
    INTERSPEECH 2023, 2023, : 3567 - 3571
  • [2] SPEAKER CHANGE DETECTION AND SPEAKER DIARIZATION USING SPATIAL INFORMATION
    Hu, Mathieu
    Sharma, Dushyant
    Doclo, Simon
    Brookes, Mike
    Naylor, Patrick A.
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5743 - 5747
  • [3] Speaker Diarization and Detection System using A Priori Speaker Information
    Kenai, Ouassila
    Asbai, Nassim
    Ouamour, Siham
    Guerti, Mhania
    Djeghiour, Salim
    2018 2ND INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING (ICNLSP), 2018, : 73 - 78
  • [4] Efficient use of overlap information in speaker diarization
    Otterson, Scott
    Ostendorf, Mari
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 683 - 686
  • [5] Speaker Diarization Using a priori Acoustic Information
    Aronowitz, Hagai
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 944 - 947
  • [6] Fusing Audio and Video Information for Online Speaker Diarization
    Schmalenstroeer, Joerg
    Kelling, Martin
    Leutnant, Volker
    Haeb-Umbach, Reinhold
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1159 - 1162
  • [7] Agglomerative Information Bottleneck for speaker diarization of meetings data
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 250 - 255
  • [8] IMPROVING SPEAKER DIARIZATION USING SOCIAL ROLE INFORMATION
    Sapru, Ashtosh
    Yella, Sree Harsha
    Bourlard, Herve
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [9] An Information Theoretic Approach to Speaker Diarization of Meeting Data
    Vijayasenan, Deepu
    Valente, Fabio
    Bourlard, Herve
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (07): : 1382 - 1393
  • [10] Probabilistic Speaker Diarization With Bag-of-Words Representations of Speaker Angle Information
    Ishiguro, Katsuhiko
    Yamada, Takeshi
    Araki, Shoko
    Nakatani, Tomohiro
    Sawada, Hiroshi
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02): : 447 - 460