Reverberant speech recognition exploiting clarity index estimation

被引:7
|
作者
Parada, Pablo Peso [1 ]
Sharma, Dushyant [1 ]
Naylor, Patrick A. [2 ]
van Waterschoot, Toon [3 ]
机构
[1] Nuance Commun Inc, Marlow SL7 2AF, Bucks, England
[2] Univ London Imperial Coll Sci Technol & Med, Dept Elect & Elect Engn, London SW7 2AZ, England
[3] Katholieke Univ Leuven, ESAT STADIUS ETC, Dept Elect Engn, B-3001 Leuven, Belgium
关键词
Reverberant speech recognition; C-50; HLDA; Acoustic model selection; DEREVERBERATION; ENVIRONMENTS;
D O I
10.1186/s13634-015-0237-7
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We present single-channel approaches to robust automatic speech recognition (ASR) in reverberant environments based on non-intrusive estimation of the clarity index (C (50)). Our best performing method includes the estimated value of C (50) in the ASR feature vector and also uses C (50) to select the most suitable ASR acoustic model according to the reverberation level. We evaluate our method on the REVERB Challenge database employing two different C (50) estimators and show that our method outperforms the best baseline of the challenge achieved without unsupervised acoustic model adaptation, i.e. using multi-condition hidden Markov models (HMMs). Our approach achieves a 22.4 % relative word error rate reduction in comparison to the best baseline of the challenge.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [41] Blind Model Selection for Automatic Speech Recognition in Reverberant Environments
    Laurent Couvreur
    Christophe Couvreur
    Journal of VLSI signal processing systems for signal, image and video technology, 2004, 36 : 189 - 203
  • [42] On Improvement to Non-Reference Speech Intelligibility Estimation Accuracy for Reverberant Speech
    Nakazawa K.
    Kondo K.
    IEEJ Transactions on Electronics, Information and Systems, 2023, 143 (08) : 830 - 841
  • [43] Blind model selection for automatic speech recognition in reverberant environments
    Couvreur, L
    Couvreur, C
    JOURNAL OF VLSI SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2004, 36 (2-3): : 189 - 203
  • [44] The effect of GMM order and CMS on speaker recognition with reverberant speech
    Shabtai, Noam R.
    Zigel, Yaniv
    Rafaely, Boaz
    2008 HANDS-FREE SPEECH COMMUNICATION AND MICROPHONE ARRAYS, 2008, : 145 - +
  • [45] Model adaptation based on HMM decomposition for reverberant speech recognition
    Takiguchi, T
    Nakamura, S
    Huo, Q
    Shikano, K
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 827 - 830
  • [46] An Auditory Based Modulation Spectral Feature for Reverberant Speech Recognition
    Maganti, HariKrishna
    Matassoni, Marco
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 570 - 573
  • [47] Speech Clarity Index (Ψ): A Distance-Based Speech Quality Indicator and Recognition Rate Prediction for Dysarthric Speakers with Cerebral Palsy
    Kayasith, Prakasith
    Theeramunkong, Thanaruk
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (03) : 460 - 468
  • [48] Effects of Speech Clarity on Recognition Memory for Spoken Sentences
    Van Engen, Kristin J.
    Chandrasekaran, Bharath
    Smiljanic, Rajka
    PLOS ONE, 2012, 7 (09):
  • [49] SPEECH FEATURE DENOISING AND DEREVERBERATION VIA DEEP AUTOENCODERS FOR NOISY REVERBERANT SPEECH RECOGNITION
    Feng, Xue
    Zhang, Yaodong
    Glass, James
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [50] Robust Front End Processing for Speech Recognition in Reverberant Environments: Utilization of Speech Characteristics
    Petrick, Rico
    Lu, Xugang
    Unoki, Masashi
    Akagi, Masato
    Hoffmann, Ruediger
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 658 - +