SPATIAL DIFFUSENESS FEATURES FOR DNN-BASED SPEECH RECOGNITION IN NOISY AND REVERBERANT ENVIRONMENTS

被引:0
|
作者
Schwarz, Andreas [1 ]
Huemmer, Christian [1 ]
Maas, Roland [1 ]
Kellermann, Walter [1 ]
机构
[1] Friedrich Alexander Univ Erlangen Nurnberg FAU, Multimedia Commun & Signal Proc, Cauerstr 7, D-91058 Erlangen, Germany
关键词
Speech Recognition; Reverberation; Diffuse Noise; Deep Neural Networks; PERCEPTION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a spatial diffuseness feature for deep neural network (DNN)-based automatic speech recognition to improve recognition accuracy in reverberant and noisy environments. The feature is computed in real-time from multiple microphone signals without requiring knowledge or estimation of the direction of arrival, and represents the relative amount of diffuse noise in each time and frequency bin. It is shown that using the diffuseness feature as an additional input to a DNN-based acoustic model leads to a reduced word error rate for the REVERB challenge corpus, both compared to logmelspec features extracted from noisy signals, and features enhanced by spectral subtraction.
引用
收藏
页码:4380 / 4384
页数:5
相关论文
共 50 条
  • [1] DNN-BASED ENHANCEMENT OF NOISY AND REVERBERANT SPEECH
    Zhao, Yan
    Wang, DeLiang
    Merks, Ivo
    Zhang, Tao
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6525 - 6529
  • [2] A systematic study of DNN based speech enhancement in reverberant and reverberant-noisy environments
    Wang, Heming
    Pandey, Ashutosh
    Wang, Deliang
    COMPUTER SPEECH AND LANGUAGE, 2025, 89
  • [3] AMPLITUDE MODULATION SPECTROGRAM BASED FEATURES FOR ROBUST SPEECH RECOGNITION IN NOISY AND REVERBERANT ENVIRONMENTS
    Moritz, Niko
    Anemueller, Joern
    Kollmeier, Birger
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5492 - 5495
  • [4] Speech Emotion Recognition in Noisy and Reverberant Environments
    Heracleous, Panikos
    Yasuda, Keiji
    Sugaya, Fumiaki
    Yoneyama, Akio
    Hashimoto, Masayuki
    2017 SEVENTH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2017, : 262 - 266
  • [5] DNN-BASED VOICE ACTIVITY DETECTION USING AUXILIARY SPEECH MODELS IN NOISY ENVIRONMENTS
    Tachioka, Yuuki
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5529 - 5533
  • [6] DNN-BASED SPEECH RECOGNITION FOR GLOBALPHONE LANGUAGES
    Tachbelie, Martha Yifiru
    Abulimiti, Ayimunishagu
    Abate, Solomon Teferra
    Schultz, Tanja
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8269 - 8273
  • [7] DNN-Based Semantic Rescoring Models for Speech Recognition
    Illina, Irina
    Fohr, Dominique
    TEXT, SPEECH, AND DIALOGUE, TSD 2021, 2021, 12848 : 357 - 370
  • [8] Investigation of DNN-Based Audio-Visual Speech Recognition
    Tamura, Satoshi
    Ninomiya, Hiroshi
    Kitaoka, Norihide
    Osuga, Shin
    Iribe, Yurie
    Takeda, Kazuya
    Hayamizu, Satoru
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (10): : 2444 - 2451
  • [9] Speech recognition based on HMM decomposition and composition method with a microphone array in noisy reverberant environments
    Miki, K
    Nishiura, T
    Nakamura, S
    Shikano, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART II-ELECTRONICS, 2002, 85 (09): : 13 - 22
  • [10] DNN-BASED EMOTION RECOGNITION BASED ON BOTTLENECK ACOUSTIC FEATURES AND LEXICAL FEATURES
    Kim, Eesung
    Shin, Jong Won
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6720 - 6724