Reverberant speech recognition exploiting clarity index estimation

被引:7
|
作者
Parada, Pablo Peso [1 ]
Sharma, Dushyant [1 ]
Naylor, Patrick A. [2 ]
van Waterschoot, Toon [3 ]
机构
[1] Nuance Commun Inc, Marlow SL7 2AF, Bucks, England
[2] Univ London Imperial Coll Sci Technol & Med, Dept Elect & Elect Engn, London SW7 2AZ, England
[3] Katholieke Univ Leuven, ESAT STADIUS ETC, Dept Elect Engn, B-3001 Leuven, Belgium
关键词
Reverberant speech recognition; C-50; HLDA; Acoustic model selection; DEREVERBERATION; ENVIRONMENTS;
D O I
10.1186/s13634-015-0237-7
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We present single-channel approaches to robust automatic speech recognition (ASR) in reverberant environments based on non-intrusive estimation of the clarity index (C (50)). Our best performing method includes the estimated value of C (50) in the ASR feature vector and also uses C (50) to select the most suitable ASR acoustic model according to the reverberation level. We evaluate our method on the REVERB Challenge database employing two different C (50) estimators and show that our method outperforms the best baseline of the challenge achieved without unsupervised acoustic model adaptation, i.e. using multi-condition hidden Markov models (HMMs). Our approach achieves a 22.4 % relative word error rate reduction in comparison to the best baseline of the challenge.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 50 条
  • [1] Reverberant speech recognition exploiting clarity index estimation
    Pablo Peso Parada
    Dushyant Sharma
    Patrick A. Naylor
    Toon van Waterschoot
    EURASIP Journal on Advances in Signal Processing, 2015
  • [2] IMPULSE RESPONSE ESTIMATION FOR ROBUST SPEECH RECOGNITION IN A REVERBERANT ENVIRONMENT
    Ravanelli, Mirco
    Sosi, Alessandro
    Svaizer, Piergiorgio
    Omologo, Maurizio
    2012 PROCEEDINGS OF THE 20TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2012, : 1668 - 1672
  • [3] Confidence Measures for Nonintrusive Estimation of Speech Clarity Index
    Parada, Pablo Peso
    Sharma, Dushyant
    van Waterschoot, Toon
    Naylor, Patrick A.
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2017, 65 (1-2): : 90 - 99
  • [4] Performance Estimation of Reverberant Speech Recognition Based on Reverberant Criteria RSR-Dn with Acoustic Parameters
    Fukurnori, Takahiro
    Morise, Masanori
    Nishiura, Takanobu
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 562 - +
  • [5] MULTIRESOLUTION CNN FOR REVERBERANT SPEECH RECOGNITION
    Park, Sunchan
    Jeong, Yongwon
    Kim, Hyung Soon
    2017 20TH CONFERENCE OF THE ORIENTAL CHAPTER OF THE INTERNATIONAL COORDINATING COMMITTEE ON SPEECH DATABASES AND SPEECH I/O SYSTEMS AND ASSESSMENT (O-COCOSDA), 2017, : 214 - 217
  • [6] REVERBERANT SPEECH RECOGNITION: A PHONEME ANALYSIS
    Parada, Pablo Peso
    Sharma, Dushyant
    Naylor, Patrick A.
    van Waterschoot, Toon
    2014 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP), 2014, : 567 - 571
  • [7] Mask estimation and imputation methods for missing data speech recognition in a multisource reverberant environment
    Keronen, Sami
    Kallasjoki, Heikki
    Remes, Ulpu
    Brown, Guy J.
    Gemmeke, Jort F.
    Palomaki, Kalle J.
    COMPUTER SPEECH AND LANGUAGE, 2013, 27 (03): : 798 - 819
  • [8] Reverberant Speech Recognition Based on Denoising Autoencoder
    Ishii, Takaaki
    Komiyama, Hiroki
    Shinozaki, Takahiro
    Horiuchi, Yasuo
    Kuroiwa, Shingo
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 3479 - 3483
  • [9] A STUDY ON DATA AUGMENTATION OF REVERBERANT SPEECH FOR ROBUST SPEECH RECOGNITION
    Ko, Tom
    Peddinti, Vijayaditya
    Povey, Daniel
    Seltzer, Michael L.
    Khudanpur, Sanjeev
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5220 - 5224
  • [10] Modulation spectrum analysis for recognition of reverberant speech
    Mallidi, Sri Harish
    Ganapathy, Sriram
    Hermansky, Hynek
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 196 - 199