Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives

被引:14
|
作者
Cerva, Petr [1 ]
Silovsky, Jan [1 ]
Zdansky, Jindrich [1 ]
Nouza, Jan [1 ]
Seps, Ladislav [1 ]
机构
[1] Tech Univ Liberec, Inst Informat Technol & Elect, Liberec 46117, Czech Republic
关键词
Speaker adaptive; Automatic speech recognition; Speaker adaptation; Speaker diarization; Automatic transcription; Large spoken archives; ADAPTATION; ACCESS;
D O I
10.1016/j.specom.2013.06.017
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper deals with speaker-adaptive speech recognition for large spoken archives. The goal is to improve the recognition accuracy of an automatic speech recognition (ASR) system that is being deployed for transcription of a large archive of Czech radio. This archive represents a significant part of Czech cultural heritage, as it contains recordings covering 90 years of broadcasting. A large portion of these documents (100,000 h) is to be transcribed and made public for browsing. To improve the transcription results, an efficient speaker-adaptive scheme is proposed. The scheme is based on integration of speaker diarization and adaptation methods and is designed to achieve a low Real-Time Factor (RTF) of the entire adaptation process, because the archive's size is enormous. It thus employs just two decoding passes, where the first one is carried out using the lexicon with a reduced number of items. Moreover, the transcripts from the first pass serve not only for adaptation, but also as the input to the speaker diarization module, which employs two-stage clustering. The output of diarization is then utilized for a cluster-based unsupervised Speaker Adaptation (SA) approach that also utilizes information based on the gender of each individual speaker. Presented experimental results on various types of programs show that our adaptation scheme yields a significant Word Error Rate (WER) reduction from 22.24% to 18.85% over the Speaker Independent (SI) system while operating at a reasonable RTF. (c) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:1033 / 1046
页数:14
相关论文
共 50 条
  • [41] ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding
    He, Mao-Kui
    Du, Jun
    Liu, Qing-Feng
    Lee, Chin-Hui
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1561 - 1573
  • [42] SPEAKER DIARIZATION AND SPEECH RECOGNITION IN THE SEMI-AUTOMATIZATION OF AUDIO DESCRIPTION: AN EXPLORATORY STUDY ON FUTURE POSSIBILITIES?
    Delgado, Hector
    Matamala, Anna
    Serrano, Javier
    CADERNOS DE TRADUCAO, 2015, 35 (02): : 308 - 324
  • [43] Speaker adaptation techniques for speech recognition using probabilistic models
    Shinoda, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2005, 88 (12): : 25 - 42
  • [44] Continuous speech recognition using an on-line speaker adaptation method based on automatic speaker clustering
    Zhang, W
    Nakagawa, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2003, E86D (03) : 464 - 473
  • [45] INFORMATION BOTTLENECK BASED SPEAKER DIARIZATION OF MEETINGS USING NON-SPEECH AS SIDE INFORMATION
    Yella, Sree Harsha
    Bourlard, Herve
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [46] CHARACTERIZING PERFORMANCE OF SPEAKER DIARIZATION SYSTEMS ON FAR-FIELD SPEECH USING STANDARD METHODS
    Maciejewski, Matthew
    Snyder, David
    Manohar, Vimal
    Dehak, Najim
    Khudanpur, Sanjeev
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5244 - 5248
  • [47] AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario
    Fu, Yihui
    Cheng, Luyao
    Lv, Shubo
    Jv, Yukai
    Kong, Yuxiang
    Chen, Zhuo
    Hu, Yanxin
    Xie, Lei
    Wu, Jian
    Bu, Hui
    Xu, Xin
    Du, Jun
    Chen, Jingdong
    INTERSPEECH 2021, 2021, : 3665 - 3669
  • [48] Speaker adaptive training and mixup regularization for neural network acoustic models in automatic speech recognition
    Tomashenko, Natalia
    Khokhlov, Yuri
    Esteve, Yannick
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2414 - 2418
  • [49] MAP Based Speaker Adaptation in Very Large Vocabulary Speech Recognition of Czech
    Cerva, Petr
    Nouza, Jan
    RADIOENGINEERING, 2004, 13 (03) : 42 - 46
  • [50] Deep learning-based speaker-adaptive postfiltering with limited adaptation data for embedded text-to-speech synthesis systems
    Eren, Eray
    Demiroglu, Cenk
    COMPUTER SPEECH AND LANGUAGE, 2023, 81