Speaker count: a new building block for speaker diarization

被引：0

作者：

Duong, Thanh Thi-Hien ^{[1
]}

Nguyen, Phi-Le ^{[2
]}

Nguyen, Hong-Son ^{[3
]}

Nguyen, Duc-Chien ^{[3
]}

Phan, Huy ^{[4
]}

Duong, Ngoc Q. K. ^{[5
]}

机构：

[1] Hanoi Univ Min & Geol, Hanoi, Vietnam

[2] Hanoi Univ Sci & Technol, Hanoi, Vietnam

[3] Aimenext Join Stock Co, Ho Chi Minh City, Vietnam

[4] Queen Mary Univ London, London, England

[5] InterDigital, Issy Les Moulineaux, France

来源：

2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2021年

关键词：

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In daily communication, several people sometimes talk simultaneously, resulting in overlapped speech segments. Such segments challenge machine listening tasks like speaker diarization or speech recognition. This paper presents a speaker diarization framework where speaker count, a building block to predict the number of active speakers in each analyzing audio window, is integrated. Such speaker count block can be developed independently with existing speaker diarization systems; its output is then used in the re-segmentation step of existing systems to better label active speakers in each considered window. We further investigate the effect of analyzing window size in diarization performance in an oracle setting. Our preliminary theoretical analysis shows that the overlap speech detection, a special case of speaker count, is helpful to reduce diarization error rate when the window size is small enough. Finally, experiment results obtained from two state-of-the-art diarization systems on a benchmark dataset confirm the potential benefit of the proposed approach.

引用

页码：1149 / 1155

页数：7

共 50 条

[1] New Advances in Speaker Diarization
Aronowitz, Hagai
Zhu, Weizhong
Suzuki, Masayuki
Kurata, Gakuto
Hoory, Ron
INTERSPEECH 2020, 2020, : 279 - 283
[2] COMPARISON OF DIARIZATION TOOLS FOR BUILDING SPEAKER DATABASE
Kiktova, Eva
Juhar, Jozef
ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING, 2015, 13 (04) : 321 - 326
[3] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
Rouvier, Mickael
Bousquet, Pierre-Michel
Favre, Benoit
2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
[4] A Study of New Approaches to Speaker Diarization
Reynolds, Douglas
Kenny, Patrick
Castaldo, Fabio
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1063 - +
[5] Multimodal Speaker Diarization
Noulas, Athanasios
Englebienne, Gwenn
Krose, Ben J. A.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
[6] SPEAKER DIARIZATION WITH LSTM
Wang, Quan
Downey, Carlton
Wan, Li
Mansfield, Philip Andrew
Moreno, Ignacio Lopez
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
[7] Trainable Speaker Diarization
Aronowitz, Hagai
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
[8] Bayes Factor Based Speaker Segmentation for Speaker Diarization
Speech and Audio Research Laboratory, Queensland University of Technology, Brisbane, Australia
Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH, (1405-1408):
[9] Bayes Factor Based Speaker Segmentation for Speaker Diarization
Wang, D.
Vogt, R.
Sridharan, S.
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1405 - 1408
[10] Factor Analysis for Speaker Segmentation and Improved Speaker Diarization
Desplanques, Brecht
Demuynck, Kris
Martens, Jean-Pierre
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3081 - 3085

← 1 2 3 4 5 →