Uncertainty decoding with adaptive sampling for noise robust DNN-based acoustic modeling

被引:0
|
作者
Tran, Dung T. [1 ]
Delcroix, Marc [1 ]
Ogawa, Atsunori [1 ]
Nakatani, Tomohiro [1 ]
机构
[1] NTT Corp, NTT Commun Sci Labs, 2-4 Hikaridai,Seika Cho, Keihana Sci City, Kyoto 6190237, Japan
来源
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年
关键词
speech recognition; deep neural network; uncertainty decoding; adaptation; SPEECH; COMPENSATION; PROPAGATION;
D O I
10.21437/Interspeech.2017-793
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Although deep neural network (DNN) based acoustic models have obtained remarkable results, the automatic speech recognition (ASR) performance still remains low in noise and reverberant conditions. To address this issue, a speech enhancement front-end is often used before recognition to reduce noise. However, the front-end cannot fully suppress noise and often introduces artifacts that are limiting the ASR performance improvement. Uncertainty decoding has been proposed to better interconnect the speech enhancement front-end and ASR back-end and mitigate the mismatch caused by residual noise and artifacts. By considering features as distributions instead of point estimates, the uncertainty decoding approach modifies the conventional decoding rules to account for the uncertainty emanating from the speech enhancement. Although the concept of uncertainty decoding has been investigated for DNN acoustic models recently, finding efficient ways to incorporate distribution of the enhanced features within a DNN acoustic model still requires further investigations. In this paper, we propose to parameterize the distribution of the enhanced feature and estimate the parameters by backpropagation using an unsupervised adaptation scheme. We demonstrate the effectiveness of the proposed approach on real audio data of the CHiME3 dataset.
引用
收藏
页码:3852 / 3856
页数:5
相关论文
共 50 条
  • [41] Low-rank and sparse subspace modeling of speech for DNN based acoustic modeling
    Dighe, Pranay
    Asaei, Afsaneh
    Bourlard, Herve
    SPEECH COMMUNICATION, 2019, 109 : 34 - 45
  • [42] ON GENERATING MIXING NOISE SIGNALS WITH BASIS FUNCTIONS FOR SIMULATING NOISY SPEECH AND LEARNING DNN-BASED SPEECH ENHANCEMENT MODELS
    Wen, Shi-Xue
    Du, Jun
    Lee, Chin-Hui
    2017 IEEE 27TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2017,
  • [43] Robust i-vector based Adaptation of DNN Acoustic Model for Speech Recognition
    Garimella
    Mandal, Arindam
    Strom, Nikko
    Hoffmeister, Bjorn
    Matsoukas, Spyros
    Parthasarathi, Hari Krishnan
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2877 - 2881
  • [44] EXPLOITING LOW-DIMENSIONAL STRUCTURES TO ENHANCE DNN BASED ACOUSTIC MODELING IN SPEECH RECOGNITION
    Dighe, Pranay
    Luyet, Gil
    Asaei, Afsaneh
    Bourlard, Herve
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5690 - 5694
  • [45] CNMF-BASED ACOUSTIC FEATURES FOR NOISE-ROBUST ASR
    Vaz, Colin
    Dimitriadis, Dimitrios
    Thomas, Samuel
    Narayanani, Shrikanth
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5735 - 5739
  • [46] Generating complementary acoustic model spaces in DNN-based sequence-to-frame DTW scheme for out-of-vocabulary spoken term detection
    Lee, Shi-wook
    Tanaka, Kazuyo
    Itoh, Yoshiaki
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 755 - 759
  • [47] Frame-wise Online Unsupervised Adaptation of DNN-HMM Acoustic Model from Perspective of Robust Adaptive Filtering
    Takeda, Ryu
    Komatani, Kazunori
    INTERSPEECH 2020, 2020, : 1291 - 1295
  • [48] Joint Noise and Reverberation Adaptive Learning for Robust Speaker DOA Estimation with An Acoustic Vector Sensor
    Wang, Disong
    Zou, Yuexian
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 821 - 825
  • [49] ROBUST MASK ESTIMATION BY INTEGRATING NEURAL NETWORK-BASED AND CLUSTERING-BASED APPROACHES FOR ADAPTIVE ACOUSTIC BEAMFORMING
    Zhou, Ying
    Qian, Yanmin
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 536 - 540
  • [50] An end-to-end DNN-HMM based system with duration modeling for robust earthquake detection
    Martin, Catalina Murua Marcelo
    Marin, Marcelo
    Cofre, Aaron
    Wuth, Jorge
    Pino, Oscar Vasquez
    Yoma, Nestor Becerra
    COMPUTERS & GEOSCIENCES, 2023, 179