Uncertainty decoding with adaptive sampling for noise robust DNN-based acoustic modeling

被引:0
|
作者
Tran, Dung T. [1 ]
Delcroix, Marc [1 ]
Ogawa, Atsunori [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, 2-4 Hikaridai,Seika Cho, Keihana Sci City, Kyoto 6190237, Japan
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
关键词
speech recognition; deep neural network; uncertainty decoding; adaptation; SPEECH; COMPENSATION; PROPAGATION;
D O I
10.21437/Interspeech.2017-793
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although deep neural network (DNN) based acoustic models have obtained remarkable results, the automatic speech recognition (ASR) performance still remains low in noise and reverberant conditions. To address this issue, a speech enhancement front-end is often used before recognition to reduce noise. However, the front-end cannot fully suppress noise and often introduces artifacts that are limiting the ASR performance improvement. Uncertainty decoding has been proposed to better interconnect the speech enhancement front-end and ASR back-end and mitigate the mismatch caused by residual noise and artifacts. By considering features as distributions instead of point estimates, the uncertainty decoding approach modifies the conventional decoding rules to account for the uncertainty emanating from the speech enhancement. Although the concept of uncertainty decoding has been investigated for DNN acoustic models recently, finding efficient ways to incorporate distribution of the enhanced features within a DNN acoustic model still requires further investigations. In this paper, we propose to parameterize the distribution of the enhanced feature and estimate the parameters by backpropagation using an unsupervised adaptation scheme. We demonstrate the effectiveness of the proposed approach on real audio data of the CHiME3 dataset.
引用
收藏
页码:3852 / 3856
页数:5
相关论文
共 50 条
  • [21] Model-based feature enhancement with uncertainty decoding for noise robust ASR
    Stouten, Veronique
    Van hamme, Hugo
    Warnbacq, Patrick
    SPEECH COMMUNICATION, 2006, 48 (11) : 1502 - 1514
  • [22] Robust DNN-based VAD augmented with phone entropy based rejection of background speech
    Fujita, Yuya
    Iso, Ken-ichi
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3663 - 3667
  • [23] DNN-Based Speech Enhancement Using Soft Audible Noise Masking for Wind Noise Reduction
    Bai, Haichuan
    Ge, Fengpei
    Yan, Yonghong
    CHINA COMMUNICATIONS, 2018, 15 (09) : 235 - 243
  • [24] AN UNCERTAINTY DECODING APPROACH TO NOISE- AND REVERBERATION-ROBUST SPEECH RECOGNITION
    Maas, Roland
    Thippur, Akshaya
    Sehr, Armin
    Kellermann, Walter
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7388 - 7392
  • [25] JOINT UNCERTAINTY DECODING WITH THE SECOND ORDER APPROXIMATION FOR NOISE ROBUST SPEECH RECOGNITION
    Xu, Haitian
    Chin, K. K.
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3841 - 3844
  • [26] Delta-MelSpectra Features for Noise Robustness to DNN-based ASR systems
    Kumar, Kshitiz
    Liu, Chaojun
    Gong, Yifan
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2445 - 2448
  • [27] DNN-Based Speech Enhancement Using Soft Audible Noise Masking for Wind Noise Reduction
    Haichuan Bai
    Fengpei Ge
    Yonghong Yan
    中国通信, 2018, 15 (09) : 235 - 243
  • [28] Adaptive DNN-based CSI Feedback with Quantization for FDD Massive MIMO Systems
    Gao, Junjie
    Bouazizi, Mondher
    Ohtsuki, Tomoaki
    Gui, Guan
    2022 IEEE 96TH VEHICULAR TECHNOLOGY CONFERENCE (VTC2022-FALL), 2022,
  • [29] INTEGRATING DNN-BASED AND SPATIAL CLUSTERING-BASED MASK ESTIMATION FOR ROBUST MVDR BEAMFORMING
    Nakatani, Tomohiro
    To, Nobutaka
    Higuchi, Takuya
    Araki, Shoko
    Kinoshita, Keisuke
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 286 - 290
  • [30] Constrained DNN-Based Robust Model Predictive Control Scheme with Adjustable Error Tube
    Yang, Shizhong
    Liu, Yanli
    Cao, Huidong
    SYMMETRY-BASEL, 2023, 15 (10):