JOINT ACOUSTIC FACTOR LEARNING FOR ROBUST DEEP NEURAL NETWORK BASED AUTOMATIC SPEECH RECOGNITION

被引:0
|
作者
Kundu, Souvik [1 ]
Mantena, Gautam [1 ]
Qian, Yanmin [2 ]
Tan, Tian [2 ]
Delcroix, Marc [3 ]
Sim, Khe Chai [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore, Singapore
[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
[3] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
关键词
deep neural networks; joint factor learning; adaptation; bottleneck vectors; robust speech recognition; SPEAKER ADAPTATION; TRANSFORMATIONS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks (DNNs) for acoustic modeling have been shown to provide impressive results on many state-of-the-art automatic speech recognition (ASR) applications. However, DNN performance degrades due to mismatches in training and testing conditions and thus adaptation is necessary. In this paper, we explore the use of discriminative auxiliary input features obtained using joint acoustic factor learning for DNN adaptation. These features are derived from a bottleneck (BN) layer of a DNN and are referred to as BN vectors. To derive these BN vectors, we explore the use of two types of joint acoustic factor learning which capture speaker and auxiliary information such as noise, phone and articulatory information of speech. In this paper, we show that these BN vectors can be used for adaptation and thereby improve the performance of an ASR system. We also show that the performance can be further improved on augmenting these BN vectors to conventional i-vectors. In this paper, experiments are performed on Aurora-4, REVERB challenge and AMI databases.
引用
收藏
页码:5025 / 5029
页数:5
相关论文
共 50 条
  • [1] Binaural Deep Neural Network for Noise Robust Automatic Speech Recognition
    Jiang, Yi
    Zu, Yuan-Yuan
    INTERNATIONAL CONFERENCE ON CONTROL ENGINEERING AND AUTOMATION (ICCEA 2014), 2014, : 512 - 517
  • [2] Deep Neural Network Based Speech Separation for Robust Speech Recognition
    Tu Yanhui
    Jun, Du
    Xu Yong
    Dai Lirong
    Chin-Hui, Lee
    2014 12TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP), 2014, : 532 - 536
  • [3] Neural Network Based Multi-Factor Aware Joint Training for Robust Speech Recognition
    Qian, Yanmin
    Tan, Tian
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (12) : 2231 - 2240
  • [4] Robust automatic speech recognition based on neural network in reverberant environments
    Bai, L.
    Li, H. L.
    He, Y. Y.
    CIVIL, ARCHITECTURE AND ENVIRONMENTAL ENGINEERING, VOLS 1 AND 2, 2017, : 1319 - 1324
  • [5] EXEMPLAR-BASED SPEECH ENHANCEMENT FOR DEEP NEURAL NETWORK BASED AUTOMATIC SPEECH RECOGNITION
    Baby, Deepak
    Gemmeke, Jort F.
    Virtanen, Tuomas
    Van hamme, Hugo
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4485 - 4489
  • [6] LOCAL TRAJECTORY BASED SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION WITH DEEP NEURAL NETWORK
    You, Yongbin
    Qian, Yanmin
    Yu, Kai
    2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 5 - 9
  • [7] Joint Training of Speech Separation, Filterbank and Acoustic Model for Robust Automatic Speech Recognition
    Wang, Zhong-Qiu
    Wang, DeLiang
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2839 - 2843
  • [8] Deep Neural Network Based Spectral Feature Mapping for Robust Speech Recognition
    Han, Kun
    He, Yanzhang
    Bagchi, Deblin
    Fosler-Lussier, Eric
    Wang, DeLiang
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2484 - 2488
  • [9] TOWARDS STRUCTURED DEEP NEURAL NETWORK FOR AUTOMATIC SPEECH RECOGNITION
    Liao, Yi-Hsiu
    Lee, Hung-yi
    Lee, Lin-shan
    2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 137 - 144
  • [10] Joint Optimization of Spectro-Temporal Features and Deep Neural Nets for Robust Automatic Speech Recognition
    Kovacs, Gyorgy
    Toth, Laszlo
    ACTA CYBERNETICA, 2015, 22 (01): : 117 - 134