Uncertainty decoding with adaptive sampling for noise robust DNN-based acoustic modeling

被引:0
|
作者
Tran, Dung T. [1 ]
Delcroix, Marc [1 ]
Ogawa, Atsunori [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, 2-4 Hikaridai,Seika Cho, Keihana Sci City, Kyoto 6190237, Japan
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
关键词
speech recognition; deep neural network; uncertainty decoding; adaptation; SPEECH; COMPENSATION; PROPAGATION;
D O I
10.21437/Interspeech.2017-793
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although deep neural network (DNN) based acoustic models have obtained remarkable results, the automatic speech recognition (ASR) performance still remains low in noise and reverberant conditions. To address this issue, a speech enhancement front-end is often used before recognition to reduce noise. However, the front-end cannot fully suppress noise and often introduces artifacts that are limiting the ASR performance improvement. Uncertainty decoding has been proposed to better interconnect the speech enhancement front-end and ASR back-end and mitigate the mismatch caused by residual noise and artifacts. By considering features as distributions instead of point estimates, the uncertainty decoding approach modifies the conventional decoding rules to account for the uncertainty emanating from the speech enhancement. Although the concept of uncertainty decoding has been investigated for DNN acoustic models recently, finding efficient ways to incorporate distribution of the enhanced features within a DNN acoustic model still requires further investigations. In this paper, we propose to parameterize the distribution of the enhanced feature and estimate the parameters by backpropagation using an unsupervised adaptation scheme. We demonstrate the effectiveness of the proposed approach on real audio data of the CHiME3 dataset.
引用
收藏
页码:3852 / 3856
页数:5
相关论文
共 50 条
  • [1] CONSISTENT DNN UNCERTAINTY TRAINING AND DECODING FOR ROBUST ASR
    Nathwani, Karan
    Vincent, Emmanuel
    Illina, Irina
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 185 - 192
  • [2] DNN Uncertainty Propagation Using GMM-Derived Uncertainty Features for Noise Robust ASR
    Nathwani, Karan
    Vincent, Emmanuel
    Illina, Irina
    IEEE SIGNAL PROCESSING LETTERS, 2018, 25 (03) : 338 - 342
  • [3] DOMAIN EXPANSION IN DNN-BASED ACOUSTIC MODELS FOR ROBUST SPEECH RECOGNITION
    Ghorbani, Shahram
    Khorram, Soheil
    Hansen, John H. L.
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 107 - 113
  • [4] Uncertainty decoding for DNN-HMM hybrid systems based on numerical sampling
    Huemmer, Christian
    Maas, Roland
    Schwarz, Andreas
    Astudillo, Ramon Fernandez
    Kellermann, Walter
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3556 - 3560
  • [5] Modeling Long Temporal Contexts for Robust DNN-based Speech Recognition
    Li, Bo
    Sim, Khe Chai
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 353 - 357
  • [6] DNN-based Amplitude and Phase Feature Enhancement for Noise Robust Speaker Identification
    Oo, Zeyan
    Kawakami, Yuta
    Wang, Longbiao
    Nakagawa, Seiichi
    Xiao, Xiong
    Iwahashi, Masahiro
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2204 - 2208
  • [7] Model adaptation employing DNN-based estimation of noise corruption function for noise-robust speech recognition
    Yoon, Ki-mu
    Kim, Wooil
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2019, 38 (01): : 47 - 50
  • [8] DNN-based anomaly prediction for the uncertainty in visual SLAM
    Bosdelekidis, Vasileios
    Johansen, Tor A.
    Sokolova, Nadezda
    2022 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV), 2022, : 684 - 691
  • [9] A DNN-BASED ACOUSTIC MODELING OF TONAL LANGUAGE AND ITS APPLICATION TO MANDARIN PRONUNCIATION TRAINING
    Hu, Wenping
    Qian, Yao
    Soong, Frank K.
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [10] AN EXTENDED EXPERIMENTAL INVESTIGATION OF DNN UNCERTAINTY PROPAGATION FOR NOISE ROBUST ASR
    Nathwani, Karan
    Morales-Cordovilla, Juan A.
    Sivasankaran, Sunit
    Illina, Irina
    Vincent, Emmanuel
    2017 HANDS-FREE SPEECH COMMUNICATIONS AND MICROPHONE ARRAYS (HSCMA 2017), 2017, : 26 - 30