JOINT ACOUSTIC FACTOR LEARNING FOR ROBUST DEEP NEURAL NETWORK BASED AUTOMATIC SPEECH RECOGNITION

被引:0
作者
Kundu, Souvik [1 ]
Mantena, Gautam [1 ]
Qian, Yanmin [2 ]
Tan, Tian [2 ]
Delcroix, Marc [3 ]
Sim, Khe Chai [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore, Singapore
[2] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai, Peoples R China
[3] NTT Corp, NTT Commun Sci Labs, Kyoto, Japan
来源
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS | 2016年
关键词
deep neural networks; joint factor learning; adaptation; bottleneck vectors; robust speech recognition; SPEAKER ADAPTATION; TRANSFORMATIONS;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Deep neural networks (DNNs) for acoustic modeling have been shown to provide impressive results on many state-of-the-art automatic speech recognition (ASR) applications. However, DNN performance degrades due to mismatches in training and testing conditions and thus adaptation is necessary. In this paper, we explore the use of discriminative auxiliary input features obtained using joint acoustic factor learning for DNN adaptation. These features are derived from a bottleneck (BN) layer of a DNN and are referred to as BN vectors. To derive these BN vectors, we explore the use of two types of joint acoustic factor learning which capture speaker and auxiliary information such as noise, phone and articulatory information of speech. In this paper, we show that these BN vectors can be used for adaptation and thereby improve the performance of an ASR system. We also show that the performance can be further improved on augmenting these BN vectors to conventional i-vectors. In this paper, experiments are performed on Aurora-4, REVERB challenge and AMI databases.
引用
收藏
页码:5025 / 5029
页数:5
相关论文
共 35 条
  • [1] Abdel-Hamid O, 2013, INT CONF ACOUST SPEE, P7942, DOI 10.1109/ICASSP.2013.6639211
  • [2] Albesano D, 2006, IEEE IJCNN, P1554
  • [3] Anguera X., 2014, BEAMFORMIT
  • [4] Acoustic beamforming for speaker diarization of meetings
    Anguera, Xavier
    Wooters, Chuck
    Hernando, Javier
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (07): : 2011 - 2022
  • [5] [Anonymous], 2015, REVERB CHALLENGE
  • [6] [Anonymous], 2011, IEEE 2011 WORKSHOP
  • [7] [Anonymous], 2014, TECH REP
  • [8] Multitask learning
    Caruana, R
    [J]. MACHINE LEARNING, 1997, 28 (01) : 41 - 75
  • [9] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition
    Dahl, George E.
    Yu, Dong
    Deng, Li
    Acero, Alex
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01): : 30 - 42
  • [10] Delcroix M., 2014, P REVERB MAY