Speaker count: a new building block for speaker diarization

被引:0
|
作者
Duong, Thanh Thi-Hien [1 ]
Nguyen, Phi-Le [2 ]
Nguyen, Hong-Son [3 ]
Nguyen, Duc-Chien [3 ]
Phan, Huy [4 ]
Duong, Ngoc Q. K. [5 ]
机构
[1] Hanoi Univ Min & Geol, Hanoi, Vietnam
[2] Hanoi Univ Sci & Technol, Hanoi, Vietnam
[3] Aimenext Join Stock Co, Ho Chi Minh City, Vietnam
[4] Queen Mary Univ London, London, England
[5] InterDigital, Issy Les Moulineaux, France
来源
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC) | 2021年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In daily communication, several people sometimes talk simultaneously, resulting in overlapped speech segments. Such segments challenge machine listening tasks like speaker diarization or speech recognition. This paper presents a speaker diarization framework where speaker count, a building block to predict the number of active speakers in each analyzing audio window, is integrated. Such speaker count block can be developed independently with existing speaker diarization systems; its output is then used in the re-segmentation step of existing systems to better label active speakers in each considered window. We further investigate the effect of analyzing window size in diarization performance in an oracle setting. Our preliminary theoretical analysis shows that the overlap speech detection, a special case of speaker count, is helpful to reduce diarization error rate when the window size is small enough. Finally, experiment results obtained from two state-of-the-art diarization systems on a benchmark dataset confirm the potential benefit of the proposed approach.
引用
收藏
页码:1149 / 1155
页数:7
相关论文
共 50 条
  • [1] New Advances in Speaker Diarization
    Aronowitz, Hagai
    Zhu, Weizhong
    Suzuki, Masayuki
    Kurata, Gakuto
    Hoory, Ron
    INTERSPEECH 2020, 2020, : 279 - 283
  • [2] COMPARISON OF DIARIZATION TOOLS FOR BUILDING SPEAKER DATABASE
    Kiktova, Eva
    Juhar, Jozef
    ADVANCES IN ELECTRICAL AND ELECTRONIC ENGINEERING, 2015, 13 (04) : 321 - 326
  • [3] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
    Rouvier, Mickael
    Bousquet, Pierre-Michel
    Favre, Benoit
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
  • [4] A Study of New Approaches to Speaker Diarization
    Reynolds, Douglas
    Kenny, Patrick
    Castaldo, Fabio
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1063 - +
  • [5] Multimodal Speaker Diarization
    Noulas, Athanasios
    Englebienne, Gwenn
    Krose, Ben J. A.
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
  • [6] SPEAKER DIARIZATION WITH LSTM
    Wang, Quan
    Downey, Carlton
    Wan, Li
    Mansfield, Philip Andrew
    Moreno, Ignacio Lopez
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
  • [7] Trainable Speaker Diarization
    Aronowitz, Hagai
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024
  • [8] Bayes Factor Based Speaker Segmentation for Speaker Diarization
    Speech and Audio Research Laboratory, Queensland University of Technology, Brisbane, Australia
    Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH, (1405-1408):
  • [9] Bayes Factor Based Speaker Segmentation for Speaker Diarization
    Wang, D.
    Vogt, R.
    Sridharan, S.
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1405 - 1408
  • [10] Factor Analysis for Speaker Segmentation and Improved Speaker Diarization
    Desplanques, Brecht
    Demuynck, Kris
    Martens, Jean-Pierre
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3081 - 3085