Speaker Diarization with Lexical Information

被引：11

作者：

Park, Tae Jin ^{[1
]}

Han, Kyu J. ^{[2
]}

Huang, Jing ^{[2
]}

He, Xiaodong ^{[2
]}

Zhou, Bowen ^{[2
]}

Georgiou, Panayiotis ^{[1
]}

Narayanan, Shrikanth ^{[1
]}

机构：

[1] Univ Southern Calif, Los Angeles, CA 90089 USA

[2] JD AI Res, Beijing, Peoples R China

来源：

INTERSPEECH 2019 | 2019年

关键词：

speaker diarization; automatic speech recognition; lexical information; adjacency matrix integration; spectral clustering; SPEECH; TRANSCRIPTION; SEGMENTATION;

D O I：

10.21437/Interspeech.2019-1947

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition. We propose a speaker diarization system that can incorporate word-level speaker turn probabilities with speaker embeddings into a speaker clustering process to improve the overall diarization accuracy. To integrate lexical and acoustic information in a comprehensive way during clustering, we introduce an adjacency matrix integration for spectral clustering. Since words and word boundary information for word-level speaker turn probability estimation are provided by a speech recognition system, our proposed method works without any human intervention for manual transcriptions. We show that the proposed method improves diarization performance on various evaluation datasets compared to the baseline diarization system using acoustic information only in speaker embeddings.

引用

页码：391 / 395

页数：5

共 50 条

[1] Lexical Speaker Error Correction: Leveraging Language Models for Speaker Diarization Error Correction
Paturi, Rohit
Srinivasan, Sundararajan
Li, Xiang
INTERSPEECH 2023, 2023, : 3567 - 3571
[2] SPEAKER CHANGE DETECTION AND SPEAKER DIARIZATION USING SPATIAL INFORMATION
Hu, Mathieu
Sharma, Dushyant
Doclo, Simon
Brookes, Mike
Naylor, Patrick A.
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 5743 - 5747
[3] Speaker Diarization and Detection System using A Priori Speaker Information
Kenai, Ouassila
Asbai, Nassim
Ouamour, Siham
Guerti, Mhania
Djeghiour, Salim
2018 2ND INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE AND SPEECH PROCESSING (ICNLSP), 2018, : 73 - 78
[4] Efficient use of overlap information in speaker diarization
Otterson, Scott
Ostendorf, Mari
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 683 - 686
[5] Speaker Diarization Using a priori Acoustic Information
Aronowitz, Hagai
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 944 - 947
[6] Fusing Audio and Video Information for Online Speaker Diarization
Schmalenstroeer, Joerg
Kelling, Martin
Leutnant, Volker
Haeb-Umbach, Reinhold
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1159 - 1162
[7] Agglomerative Information Bottleneck for speaker diarization of meetings data
Vijayasenan, Deepu
Valente, Fabio
Bourlard, Herve
2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 250 - 255
[8] IMPROVING SPEAKER DIARIZATION USING SOCIAL ROLE INFORMATION
Sapru, Ashtosh
Yella, Sree Harsha
Bourlard, Herve
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[9] An Information Theoretic Approach to Speaker Diarization of Meeting Data
Vijayasenan, Deepu
Valente, Fabio
Bourlard, Herve
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (07): : 1382 - 1393
[10] Probabilistic Speaker Diarization With Bag-of-Words Representations of Speaker Angle Information
Ishiguro, Katsuhiko
Yamada, Takeshi
Araki, Shoko
Nakatani, Tomohiro
Sawada, Hiroshi
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02): : 447 - 460

← 1 2 3 4 5 →