Uncertainty decoding with adaptive sampling for noise robust DNN-based acoustic modeling

被引:0
|
作者
Tran, Dung T. [1 ]
Delcroix, Marc [1 ]
Ogawa, Atsunori [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, 2-4 Hikaridai,Seika Cho, Keihana Sci City, Kyoto 6190237, Japan
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
关键词
speech recognition; deep neural network; uncertainty decoding; adaptation; SPEECH; COMPENSATION; PROPAGATION;
D O I
10.21437/Interspeech.2017-793
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although deep neural network (DNN) based acoustic models have obtained remarkable results, the automatic speech recognition (ASR) performance still remains low in noise and reverberant conditions. To address this issue, a speech enhancement front-end is often used before recognition to reduce noise. However, the front-end cannot fully suppress noise and often introduces artifacts that are limiting the ASR performance improvement. Uncertainty decoding has been proposed to better interconnect the speech enhancement front-end and ASR back-end and mitigate the mismatch caused by residual noise and artifacts. By considering features as distributions instead of point estimates, the uncertainty decoding approach modifies the conventional decoding rules to account for the uncertainty emanating from the speech enhancement. Although the concept of uncertainty decoding has been investigated for DNN acoustic models recently, finding efficient ways to incorporate distribution of the enhanced features within a DNN acoustic model still requires further investigations. In this paper, we propose to parameterize the distribution of the enhanced feature and estimate the parameters by backpropagation using an unsupervised adaptation scheme. We demonstrate the effectiveness of the proposed approach on real audio data of the CHiME3 dataset.
引用
收藏
页码:3852 / 3856
页数:5
相关论文
共 50 条
  • [31] ONLINE INTEGRATION OF DNN-BASED AND SPATIAL CLUSTERING-BASED MASK ESTIMATION FOR ROBUST MVDR BEAMFORMING
    Matsui, Yutaro
    Nakatani, Tomohiro
    Delcroix, Marc
    Kinoshita, Keisuke
    Ito, Nobutaka
    Araki, Shoko
    Makino, Shoji
    2018 16TH INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC), 2018, : 71 - 75
  • [32] DNN-Based Knee OA Severity Prediction System: Pathologically Robust Feature Engineering Approach
    Ruikar D.
    Kamble P.
    Ruikar A.
    Houde K.
    Hegadi R.
    SN Computer Science, 4 (1)
  • [33] Mixed-Bandwidth Cross-Channel Speech Recognition via Joint Optimization of DNN-Based Bandwidth Expansion and Acoustic Modeling
    Gao, Jianqing
    Du, Jun
    Chen, Enhong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (03) : 559 - 571
  • [34] Adaptive training with joint uncertainty decoding for robust recognition of noisy data
    Liao, H.
    Gales, M. J. F.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 389 - +
  • [35] Robust Beam forming for Speech Recognition Using DNN-Based Time-Frequency Masks Estimation
    Jiang, Wenbin
    Wen, Fei
    Liu, Peilin
    IEEE ACCESS, 2018, 6 : 52385 - 52392
  • [36] DNN-Based Surrogate Modeling-Based Feasible Performance Reliability Design Methodology for Aircraft Engine
    Cao, Dalu
    Bai, Guang-Chen
    IEEE ACCESS, 2020, 8 : 229201 - 229218
  • [37] JOINT NOISE AND MASK AWARE TRAINING FOR DNN-BASED SPEECH ENHANCEMENT WITH SUB-BAND FEATURES
    Wang, Qing
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 101 - 105
  • [38] Transfer learning for acoustic modeling of noise robust speech recognition
    Yi J.
    Tao J.
    Liu B.
    Wen Z.
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2018, 58 (01): : 55 - 60
  • [39] MULTI SCALE FEEDBACK CONNECTION FOR NOISE ROBUST ACOUSTIC MODELING
    Tran, Dung T.
    Iso, Ken-ichi
    Omachi, Motoi
    Fujita, Yuya
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4834 - 4838
  • [40] Robust Speech Recognition using DNN-HMM Acoustic Model Combining Noise-aware training with Spectral Subtraction
    Abe, Akihiro
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2849 - 2853